Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes#25
Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes#25renecannao merged 9 commits intov3.1-vecfrom
Conversation
Implement BM25-ranked full-text search capability for MySQL/MariaDB tables using SQLite-based external FTS index. Changes: - Add MySQL_FTS class for managing SQLite FTS indexes - Add FTS tools: fts_index_table, fts_search, fts_reindex, fts_delete_index, fts_list_indexes, fts_rebuild_all - Add thread-safe FTS lifecycle management with fts_lock mutex - Add reset_fts_path() for runtime FTS database path configuration - Add comprehensive FTS test scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh)
Add full-text search (FTS) tools to MCP query server
- Add special build rule in Makefile for anomaly_detection-t that includes: - $(OBJ) for global variables (GloVars, GloGATH) - -Wl,--allow-multiple-definition to allow test's main() to override ProxySQL's - ClickHouse client libraries (libclickhouse-cpp-lib.a, libzstdstatic.a, liblz4.a) - SQLite rembed library (libsqlite_rembed.a) - -lscram -lusual for PostgreSQL SCRAM support - Add stub function SQLite3_Server_session_handler required by SQLite3_Server.cpp Resolves compilation errors for anomaly_detection-t test.
- Add #include <string> for C++ std::string support - Add #include <cmath> for sqrt() function - Change format %lld to %ld for chrono duration types (long int, not long long) Resolves compilation errors for vector_db_performance-t test.
- Add #include <string> for C++ std::string support Resolves compilation errors for ai_llm_retry_scenarios-t test.
…leanup
Security fixes:
- Add escape_identifier() helper for proper SQLite identifier escaping
- Replace sanitize_name() with allowlist validation (ASCII letters, digits, underscore only)
- Fix MATCH clause FTS5 operator injection by wrapping query in double quotes
- Apply escape_identifier() to all DDL statements (CREATE, DROP, triggers)
Memory safety fixes:
- Replace VLA with std::vector in MySQL_FTS::init(), add delete on error path
- Fix memory leak: free error string before return in list_indexes()
- Fix reindex_json["error"] potential exception using .value() with default
Thread safety fixes:
- reinit_fts(): Add mutex lock around pointer swap
- reset_fts_path(): Move blocking init() outside lock, only swap pointer under lock
Code cleanup:
- Remove 7 debug fprintf statements from Query_Tool_Handler.cpp
- Remove unused #include <memory> from MySQL_FTS.h
Test script security fixes:
- Use MYSQL_PWD environment variable instead of -p"..." for password
- Add escape_sql() function and apply to INSERT statement
- Fix CURL_OPTS quoting: ${CURL_OPTS:+"${CURL_OPTS}"}
- Remove unused FTS_INDEX_NAME and SEARCH_QUERIES variables
Documentation fixes:
- Fix bare URL to markdown link format
- Add code block language identifiers (text, bash)
Summary of ChangesHello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refines the recently introduced Full-Text Search (FTS) functionality by addressing critical feedback from AI code reviewers. The primary focus is on bolstering the security, memory safety, and thread safety of the FTS implementation, ensuring a more robust and reliable system. Additionally, it integrates the FTS capabilities more deeply into the existing toolset, provides extensive user documentation, and enhances test coverage, making the feature more stable and usable. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughSummary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR introduces a complete Full-Text Search (FTS) system for MySQL tables using SQLite FTS5 backend. It adds a new MySQL_FTS class for index management, six new MCP tools (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all), thread-safe integration with MySQL_Tool_Handler, runtime-configurable FTS database path, and extensive test coverage with documentation. Changes
Sequence DiagramssequenceDiagram
participant Client
participant MySQL_Tool_Handler
participant MySQL_FTS
participant SQLite DB
participant MySQL
Client->>MySQL_Tool_Handler: fts_index_table(schema, table, columns, pk, where)
MySQL_Tool_Handler->>MySQL_FTS: index_table(schema, table, columns, pk, where)
MySQL_FTS->>SQLite DB: check index exists
alt Index exists
SQLite DB-->>MySQL_FTS: true
else Create new index
MySQL_FTS->>SQLite DB: create data_table
MySQL_FTS->>SQLite DB: create fts_table (FTS5)
MySQL_FTS->>SQLite DB: create triggers for sync
end
MySQL_FTS->>MySQL: execute SELECT query
MySQL-->>MySQL_FTS: result set
MySQL_FTS->>SQLite DB: batch insert into data_table
MySQL_FTS->>SQLite DB: update fts_indexes metadata
SQLite DB-->>MySQL_FTS: success
MySQL_FTS-->>MySQL_Tool_Handler: JSON result
MySQL_Tool_Handler-->>Client: FTS index created
sequenceDiagram
participant Client
participant MySQL_Tool_Handler
participant MySQL_FTS
participant SQLite DB
Client->>MySQL_Tool_Handler: fts_search(query, schema, table, limit, offset)
MySQL_Tool_Handler->>MySQL_FTS: search(query, schema, table, limit, offset)
MySQL_FTS->>SQLite DB: retrieve index metadata
SQLite DB-->>MySQL_FTS: matching indexes
loop For each matching index
MySQL_FTS->>SQLite DB: FTS5 search with BM25 ranking
SQLite DB-->>MySQL_FTS: ranked results with snippets
end
MySQL_FTS->>MySQL_FTS: aggregate & paginate results
MySQL_FTS-->>MySQL_Tool_Handler: JSON results array
MySQL_Tool_Handler-->>Client: search results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a Full Text Search (FTS) module to ProxySQL, leveraging SQLite's FTS5 extension for fast, indexed searching of MySQL data. The architecture includes MCP query endpoint integration, a Query Tool Handler, and a MySQL Tool Handler wrapping the MySQL_FTS class. Configuration is managed via ProxySQL's admin interface, with variables for the FTS database path and MySQL backend connection details. The FTS Tools Reference details six tools for indexing, searching, and managing FTS indexes. The code review highlights several key aspects of the changes, including robust input sanitization and identifier escaping to prevent SQL injection vulnerabilities, thread-safe access to the FTS database, runtime configuration changes for the FTS database path, and the inclusion of comprehensive test cases. Additionally, there were format specifier corrections for printing long int values and a note about a future date in the version history.
| pthread_mutex_lock(&fts_lock); | ||
| if (!fts) { | ||
| json result; | ||
| result["success"] = false; | ||
| result["error"] = "FTS not initialized"; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| std::string out = fts->index_table(schema, table, columns, primary_key, where_clause, this); | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return out; | ||
| } | ||
|
|
||
| std::string MySQL_Tool_Handler::fts_search( | ||
| const std::string& query, | ||
| const std::string& schema, | ||
| const std::string& table, | ||
| int limit, | ||
| int offset | ||
| ) { | ||
| pthread_mutex_lock(&fts_lock); | ||
| if (!fts) { | ||
| json result; | ||
| result["success"] = false; | ||
| result["error"] = "FTS not initialized"; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| std::string out = fts->search(query, schema, table, limit, offset); | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return out; | ||
| } | ||
|
|
||
| std::string MySQL_Tool_Handler::fts_list_indexes() { | ||
| pthread_mutex_lock(&fts_lock); | ||
| if (!fts) { | ||
| json result; | ||
| result["success"] = false; | ||
| result["error"] = "FTS not initialized"; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| std::string out = fts->list_indexes(); | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return out; | ||
| } | ||
|
|
||
| std::string MySQL_Tool_Handler::fts_delete_index(const std::string& schema, const std::string& table) { | ||
| pthread_mutex_lock(&fts_lock); | ||
| if (!fts) { | ||
| json result; | ||
| result["success"] = false; | ||
| result["error"] = "FTS not initialized"; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| std::string out = fts->delete_index(schema, table); | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return out; | ||
| } | ||
|
|
||
| std::string MySQL_Tool_Handler::fts_reindex(const std::string& schema, const std::string& table) { | ||
| pthread_mutex_lock(&fts_lock); | ||
| if (!fts) { | ||
| json result; | ||
| result["success"] = false; | ||
| result["error"] = "FTS not initialized"; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| std::string out = fts->reindex(schema, table, this); | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return out; | ||
| } | ||
|
|
||
| std::string MySQL_Tool_Handler::fts_rebuild_all() { | ||
| pthread_mutex_lock(&fts_lock); | ||
| if (!fts) { | ||
| json result; | ||
| result["success"] = false; | ||
| result["error"] = "FTS not initialized"; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return result.dump(); | ||
| } | ||
|
|
||
| std::string out = fts->rebuild_all(this); | ||
| pthread_mutex_unlock(&fts_lock); | ||
| return out; | ||
| } |
| std::string MySQL_FTS::sanitize_name(const std::string& name) { | ||
| const size_t MAX_NAME_LEN = 100; | ||
| std::string sanitized; | ||
| // Allowlist: only ASCII letters, digits, underscore | ||
| for (char c : name) { | ||
| if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || | ||
| (c >= '0' && c <= '9') || c == '_') { | ||
| sanitized.push_back(c); | ||
| } | ||
| } | ||
| // Prevent leading digit (SQLite identifiers can't start with digit) | ||
| if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') { | ||
| sanitized.insert(sanitized.begin(), '_'); | ||
| } | ||
| // Enforce maximum length | ||
| if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN); | ||
| return sanitized; |
There was a problem hiding this comment.
| std::string MySQL_FTS::escape_identifier(const std::string& identifier) { | ||
| std::string escaped; | ||
| escaped.reserve(identifier.length() * 2 + 2); | ||
| escaped.push_back('`'); | ||
| for (char c : identifier) { | ||
| escaped.push_back(c); | ||
| if (c == '`') escaped.push_back('`'); // Double backticks | ||
| } | ||
| escaped.push_back('`'); | ||
| return escaped; |
| int MySQL_FTS::create_index_tables(const std::string& schema, const std::string& table) { | ||
| std::string data_table = get_data_table_name(schema, table); | ||
| std::string fts_table = get_fts_table_name(schema, table); | ||
| std::string escaped_data = escape_identifier(data_table); | ||
| std::string escaped_fts = escape_identifier(fts_table); | ||
|
|
||
| // Create data table | ||
| std::ostringstream create_data_sql; | ||
| create_data_sql << "CREATE TABLE IF NOT EXISTS " << escaped_data << " (" | ||
| " rowid INTEGER PRIMARY KEY AUTOINCREMENT," | ||
| " schema_name TEXT NOT NULL," | ||
| " table_name TEXT NOT NULL," | ||
| " primary_key_value TEXT NOT NULL," | ||
| " content TEXT NOT NULL," | ||
| " metadata TEXT" | ||
| ");"; | ||
|
|
||
| if (!db->execute(create_data_sql.str().c_str())) { | ||
| proxy_error("Failed to create data table %s\n", data_table.c_str()); | ||
| return -1; | ||
| } | ||
|
|
||
| // Create FTS5 virtual table with external content | ||
| std::ostringstream create_fts_sql; | ||
| create_fts_sql << "CREATE VIRTUAL TABLE IF NOT EXISTS " << escaped_fts << " USING fts5(" | ||
| " content, metadata," | ||
| " content=" << escaped_data << "," | ||
| " content_rowid='rowid'," | ||
| " tokenize='porter unicode61'" | ||
| ");"; | ||
|
|
||
| if (!db->execute(create_fts_sql.str().c_str())) { | ||
| proxy_error("Failed to create FTS table %s\n", fts_table.c_str()); | ||
| return -1; | ||
| } | ||
|
|
||
| // Create triggers for automatic sync (populate the FTS table) | ||
| std::string base_name = sanitize_name(schema) + "_" + sanitize_name(table); | ||
| std::string escaped_base = escape_identifier(base_name); | ||
|
|
||
| // Drop existing triggers if any | ||
| db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_ai_" + base_name)).c_str()); | ||
| db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_ad_" + base_name)).c_str()); | ||
| db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_au_" + base_name)).c_str()); | ||
|
|
||
| // AFTER INSERT trigger | ||
| std::ostringstream ai_sql; | ||
| ai_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_ai_" + base_name) | ||
| << " AFTER INSERT ON " << escaped_data << " BEGIN" | ||
| << " INSERT INTO " << escaped_fts << "(rowid, content, metadata)" | ||
| << " VALUES (new.rowid, new.content, new.metadata);" | ||
| << "END;"; | ||
| db->execute(ai_sql.str().c_str()); | ||
|
|
||
| // AFTER DELETE trigger | ||
| std::ostringstream ad_sql; | ||
| ad_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_ad_" + base_name) | ||
| << " AFTER DELETE ON " << escaped_data << " BEGIN" | ||
| << " INSERT INTO " << escaped_fts << "(" << escaped_fts << ", rowid, content, metadata)" | ||
| << " VALUES ('delete', old.rowid, old.content, old.metadata);" | ||
| << "END;"; | ||
| db->execute(ad_sql.str().c_str()); | ||
|
|
||
| // AFTER UPDATE trigger | ||
| std::ostringstream au_sql; | ||
| au_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_au_" + base_name) | ||
| << " AFTER UPDATE ON " << escaped_data << " BEGIN" | ||
| << " INSERT INTO " << escaped_fts << "(" << escaped_fts << ", rowid, content, metadata)" | ||
| << " VALUES ('delete', old.rowid, old.content, old.metadata);" | ||
| << " INSERT INTO " << escaped_fts << "(rowid, content, metadata)" | ||
| << " VALUES (new.rowid, new.content, new.metadata);" | ||
| << "END;"; | ||
| db->execute(au_sql.str().c_str()); | ||
|
|
||
| return 0; | ||
| } |
| std::string fts_literal = "\""; | ||
| for (char c : query) { | ||
| fts_literal.push_back(c); | ||
| if (c == '"') fts_literal.push_back('"'); // Double quotes | ||
| } | ||
| fts_literal.push_back('"'); |
| int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) { | ||
| proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str()); | ||
|
|
||
| // Check if directory exists (SQLite can't create directories) | ||
| std::string::size_type last_slash = fts_path.find_last_of("/"); | ||
| if (last_slash != std::string::npos && last_slash > 0) { | ||
| std::string dir = fts_path.substr(0, last_slash); | ||
| struct stat st; | ||
| if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) { | ||
| proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n", | ||
| fts_path.c_str(), dir.c_str()); | ||
| return -1; | ||
| } | ||
| } | ||
|
|
||
| // First, test if we can open the new database (outside lock) | ||
| MySQL_FTS* new_fts = new MySQL_FTS(fts_path); | ||
| if (!new_fts) { | ||
| proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n"); | ||
| return -1; | ||
| } | ||
|
|
||
| if (new_fts->init() != 0) { | ||
| proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str()); | ||
| delete new_fts; | ||
| return -1; // Return error WITHOUT closing old FTS | ||
| } | ||
|
|
||
| // Success! Now swap the pointer under lock | ||
| pthread_mutex_lock(&fts_lock); | ||
| MySQL_FTS* old_fts = fts; | ||
| fts = new_fts; | ||
| pthread_mutex_unlock(&fts_lock); | ||
| if (old_fts) delete old_fts; | ||
|
|
||
| proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str()); | ||
| return 0; | ||
| } |
|
|
||
| | Version | Date | Changes | | ||
| |---------|------|---------| | ||
| | 0.1.0 | 2025-01 | Initial implementation | |
|
|
||
| if (error) { | ||
| result["error"] = "Failed to list indexes: " + std::string(error); | ||
| (*proxy_sqlite3_free)(error); |
|
|
||
| ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size); | ||
| diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count()); | ||
| diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count()); |
|
|
||
| diag("Average time per concurrent operation: %lld microseconds", avg_time); | ||
| diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count()); | ||
| diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count()); |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Fix all issues with AI agents
In `@lib/MCP_Thread.cpp`:
- Around line 350-364: The code assigns variables.mcp_fts_path = strdup(value)
before calling mysql_tool_handler->reset_fts_path, so on reset failure the
stored path is left inconsistent; change the logic in the "fts_path" branch to
preserve the previous value: save the old pointer (old =
variables.mcp_fts_path), create a new copy (new_val = strdup(value)) but do not
overwrite variables.mcp_fts_path until reset succeeds; call
mysql_tool_handler->reset_fts_path(value) and on success assign
variables.mcp_fts_path = new_val (free old if non-null), on failure free new_val
and leave variables.mcp_fts_path as old, log the error and return -1; ensure no
leaks or double-frees when old is null.
In `@lib/MySQL_FTS.cpp`:
- Around line 483-493: The early-return path that handles (!indexes_result ||
indexes_result->rows.empty()) leaks memory when indexes_result is non-null;
modify that branch so that if indexes_result is non-null you free/delete/cleanup
it (e.g., delete indexes_result or call its provided free method) before
returning, and keep db->rdunlock() in the correct order (free indexes_result if
non-null, then rdunlock(), then set result fields and return). Ensure you
reference and handle the indexes_result variable from the execute_statement
call.
- Around line 329-334: The MySQL identifier usage in the selected_cols loop and
the final FROM clause is quoting names with backticks but not escaping embedded
backticks; add an escape helper (e.g., escape_mysql_identifier) mirroring
escape_identifier() behavior that duplicates any internal '`' and wraps the
result in surrounding backticks, then replace usages building mysql_query for
selected_cols, schema and table to call
escape_mysql_identifier(selected_cols[i]), escape_mysql_identifier(schema), and
escape_mysql_identifier(table) so all identifiers are safely escaped.
- Around line 92-109: sanitize_name can return an empty string when input has
only invalid characters; update MySQL_FTS::sanitize_name to guarantee a
non-empty, collision-resistant identifier by adding a deterministic fallback:
after sanitization and truncation (use MAX_NAME_LEN), if result is empty or
would collide, prepend or replace with a safe prefix (e.g., "_") and append a
short unique suffix derived from the original name (use std::hash<std::string>
of the original input truncated to a few hex chars) so symbols like
sanitize_name and MAX_NAME_LEN are used but the function always returns a valid,
bounded-length, and effectively unique name.
- Around line 326-338: The code in MySQL_FTS.cpp concatenates the user-provided
where_clause directly into the SQL built in the fts_index_table flow (see
mysql_query construction using selected_cols, schema, table and where_clause),
creating an SQL injection risk; fix it by changing the query construction to use
parameterized/prepared statements from your MySQL client API (build the SELECT
with column/backtick quoting as now, but replace dynamic predicates in
where_clause with named or positional placeholders and bind the corresponding
values), or if placeholders cannot represent arbitrary predicates, enforce a
strict whitelist/validation of allowed columns/operators and parse/escape values
before concatenation; locate the code that forms mysql_query and refactor to use
the client’s prepare/bind/execute API or add validation logic for where_clause
and document the expected safe format for callers.
In `@lib/MySQL_Tool_Handler.cpp`:
- Around line 1047-1058: The FTS methods (fts_index_table, fts_search,
fts_reindex, fts_rebuild_all, fts_delete_index) currently hold
pthread_mutex_lock(&fts_lock) across long-running I/O; refactor them to minimize
lock scope by copying the fts pointer under the lock (e.g., lock fts_lock, if
(!fts) return error, else MySQL_FTS* local_fts = fts; unlock fts_lock) and then
call local_fts->... (index_table/search/reindex/...) outside the mutex, or
alternatively add a clear comment above these functions that the mutex
intentionally serializes all FTS operations for correctness; ensure you
reference the same error behavior (returning JSON error when fts is null) and
preserve thread-safety during the pointer swap like reset_fts_path().
In `@scripts/mcp/test_mcp_fts_detailed.sh`:
- Around line 71-74: The mysql_exec function currently passes the password on
the command line (-p"${MYSQL_PASSWORD}") which exposes it; update mysql_exec to
stop using -p and instead invoke mysql with MYSQL_PWD set in the environment for
that invocation (e.g., set MYSQL_PWD="${MYSQL_PASSWORD}" only for the mysql
command) so the password does not appear in the process list; locate the
mysql_exec function and replace the argument-based password with an
environment-based MYSQL_PWD usage and ensure the variable is not exported beyond
the single invocation.
- Around line 251-260: The cleanup block guarded by CLEANUP calls
fts_delete_index twice and attempts to delete fts_test.orders again even though
it was already removed earlier; update the cleanup to avoid failing under set
-euo pipefail by either removing the redundant call that invokes tool_call
"fts_delete_index" for '{"schema":"fts_test","table":"orders"}' or make that
specific delete call tolerant of non-existent indexes (e.g., append a fallback
so the extract_tool_result/echo/jq pipeline does not cause a non-zero exit on
"not found" errors); ensure you modify the code that calls tool_call
"fts_delete_index" and the subsequent extract_tool_result/check logic to
implement the chosen approach.
In `@scripts/mcp/test_mcp_fts.sh`:
- Around line 205-240: The current fallback JSON parsing in extract_json_field
and check_json_bool is broken for jq-style dot-notation and complex paths (e.g.,
".success", ".results | length") and silently fails; update the script to
enforce jq presence at startup (fail fast with a clear error) OR rewrite the
fallback to canonicalize the field by stripping leading dots and support simple
dot-path traversal and array length checks used by extract_inner_result;
specifically modify the startup check to call command -v jq and exit with an
informative message if missing, and update extract_json_field/check_json_bool to
delegate to jq-only or to parse dot paths by splitting on '.' and traversing
nested objects/arrays so patterns like .success and .results|length return
correct values.
- Around line 509-514: The MCP tool responses are wrapped; replace uses of
extract_json_field with extract_inner_field when pulling numeric fields from the
wrapped fts_tool_call response (e.g., change total_matches, results | length,
row_count, rebuilt_count extractions). Locate occurrences where
extract_json_field is used alongside variables like total_matches, result_count,
row_count, rebuilt_count (and where check_json_bool already uses
extract_inner_result) and swap to extract_inner_field so the code unwraps the
MCP JSON-RPC result before evaluating numeric comparisons; keep the same
variable names and call sites (e.g., where response is parsed after
fts_tool_call and in routines that follow check_json_bool).
In `@test/tap/tests/Makefile`:
- Around line 171-173: The Makefile target anomaly_detection-t currently passes
the misspelled preprocessor define -DEXCLUDE_TRACKING_VARAIABLES; update that
flag to the correctly spelled -DEXCLUDE_TRACKING_VARIABLES in the
anomaly_detection-t rule so the code that checks for EXCLUDE_TRACKING_VARIABLES
is enabled as intended.
In `@test/tap/tests/vector_db_performance-t.cpp`:
- Around line 380-381: The second diagnostic uses the wrong/ inconsistent printf
format for total_duration.count(); change the format to match the 64-bit type
(use %lld) and ensure you cast total_duration.count() to (long long) in the diag
call so both diag("Average time per concurrent operation: %lld microseconds",
avg_time) and diag("Total time for %d operations: %lld microseconds",
num_operations, (long long)total_duration.count()) use the same 64-bit specifier
and types; locate these calls (diag with avg_time and diag with
total_duration.count()) and update the format/specifier and cast accordingly.
- Line 325: The diag call uses the wrong printf format specifier for
insert_duration.count(); update the format to safely match the underlying type
by either using "%lld" and casting insert_duration.count() to (long long) or by
casting to long and using "%ld" consistently; change the diag invocation that
prints "Time to insert %zu entries: %ld ms" to use a matching specifier and cast
(e.g., "%zu entries: %lld ms", (long long)insert_duration.count()) so the format
matches insert_duration.count() and aligns with the other timing output.
🧹 Nitpick comments (8)
doc/MCP/FTS_USER_GUIDE.md (1)
1-1: Consider hyphenating "Full-Text Search" for grammatical correctness."Full-Text Search" is a compound adjective modifying "User Guide" and should be hyphenated for consistency with standard technical writing conventions. This applies to both the title and the Overview section (line 19).
Suggested fix
-# MCP Full Text Search (FTS) - User Guide +# MCP Full-Text Search (FTS) - User GuideAnd similarly on line 19:
-The MCP Full Text Search (FTS) module provides fast, indexed search capabilities +The MCP Full-Text Search (FTS) module provides fast, indexed search capabilitiesscripts/mcp/test_mcp_fts.sh (2)
147-150: Add curl timeouts to avoid hanging test runs.If the MCP endpoint is down,
curlcan block indefinitely. Add--connect-timeout/--max-time(optionally configurable via env).⏱️ Suggested timeout defaults
- response=$(curl -s -w "\n%{http_code}" -X POST "${MCP_ENDPOINT}" \ + response=$(curl -s --connect-timeout 5 --max-time 30 -w "\n%{http_code}" -X POST "${MCP_ENDPOINT}" \ -H "Content-Type: application/json" \ -d "${payload}" 2>/dev/null)
352-355: Drop unuseddelete_response.Shellcheck flags
delete_responseas unused. Either log failures or remove the variable.🧹 Remove unused variable
- local delete_response - delete_response=$(fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}") + fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}" >/dev/nulllib/Query_Tool_Handler.cpp (1)
220-227: Tool schema markscolumnsas string, but runtime expects an array.
create_tool_schematypes required params as strings, yetfts_index_tableexpects a JSON array (tests send an array). This mismatch can confuse schema-driven clients. Consider a custom schema for this tool wherecolumnsis an array of strings.♻️ Example custom schema for
fts_index_table- tools.push_back(create_tool_schema( - "fts_index_table", - "Create and populate a full-text search index for a MySQL table", - {"schema", "table", "columns", "primary_key"}, - {{"where_clause", "string"}} - )); + { + json schema = { + {"type", "object"}, + {"properties", { + {"schema", {{"type", "string"}, {"description", "schema parameter"}}}, + {"table", {{"type", "string"}, {"description", "table parameter"}}}, + {"columns", {{"type", "array"}, {"items", {{"type", "string"}}}, {"description", "columns to index"}}}, + {"primary_key", {{"type", "string"}, {"description", "primary_key parameter"}}}, + {"where_clause", {{"type", "string"}, {"description", "where_clause parameter"}}} + }}, + {"required", {"schema", "table", "columns", "primary_key"}} + }; + tools.push_back(create_tool_description( + "fts_index_table", + "Create and populate a full-text search index for a MySQL table", + schema + )); + }include/MySQL_FTS.h (1)
90-96: Delete copy and move operations to prevent accidental resource duplication.The class owns a raw
SQLite3DB*pointer that is deleted in the destructor. Since copy/move operations are not explicitly deleted, they're implicitly defined, which means shallow copies could lead to double-free or use-after-free if the class is ever used by value or moved. Although the current codebase only uses pointer instances, it's defensive to explicitly prevent copying/moving.🔒 Disable copy/move
MySQL_FTS(const std::string& path); + MySQL_FTS(const MySQL_FTS&) = delete; + MySQL_FTS& operator=(const MySQL_FTS&) = delete; + MySQL_FTS(MySQL_FTS&&) = delete; + MySQL_FTS& operator=(MySQL_FTS&&) = delete; /** * `@brief` Destructor */ ~MySQL_FTS();lib/MySQL_Tool_Handler.cpp (2)
117-138: Redundant implementations of FTS path reset logic.Both
reset_fts_path()(lines 117-138) andreinit_fts()(lines 1142-1179) implement nearly identical functionality for swapping the FTS instance. The key differences are:
reinit_fts()additionally checks if the directory existsreinit_fts()returnsintwhilereset_fts_path()returnsboolConsider consolidating into a single implementation to avoid code duplication and potential divergence in behavior.
♻️ Suggested consolidation
bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) { - MySQL_FTS* new_fts = NULL; - - // Initialize new FTS outside lock (blocking I/O) - if (!path.empty()) { - new_fts = new MySQL_FTS(path); - if (new_fts->init()) { - proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str()); - delete new_fts; - return false; - } - } - - // Swap pointer under lock (non-blocking) - pthread_mutex_lock(&fts_lock); - MySQL_FTS* old_fts = fts; - fts = new_fts; - pthread_mutex_unlock(&fts_lock); - if (old_fts) delete old_fts; - - return true; + return reinit_fts(path) == 0; }Then update
reinit_fts()to handle empty path case if needed, or keepreset_fts_path()as the primary implementation and havereinit_fts()call it after the directory check.Also applies to: 1142-1179
303-329: Remove debugfprintfstatements before merging.Multiple debug
fprintf(stderr, ...)statements remain throughout the file (e.g., lines 303, 309, 317, 319, 325, 329, 343-356, 401, 413, 419, 493-532, 537-566). The PR objectives mention "Code cleanup: removed seven debug fprintfs" but many still remain.These should be removed or converted to proper
proxy_debug()calls for production code.Also applies to: 343-356, 401-419, 493-532, 537-566
lib/MySQL_FTS.cpp (1)
359-430: Database lock held during entire indexing operation.The
wrlock()acquired at line 359 is held until line 430, encompassing potentially thousands of row insertions. This could block read operations (search,list_indexes) for extended periods.Consider committing in batches and releasing/reacquiring the lock between batches, or use a transaction with checkpoints.
| if (!strcmp(name, "fts_path")) { | ||
| if (variables.mcp_fts_path) | ||
| free(variables.mcp_fts_path); | ||
| variables.mcp_fts_path = strdup(value); | ||
| // Apply at runtime by resetting FTS in the existing handler | ||
| if (mysql_tool_handler) { | ||
| proxy_info("MCP: Applying new fts_path at runtime: %s\n", value); | ||
| if (!mysql_tool_handler->reset_fts_path(value)) { | ||
| proxy_error("Failed to reset FTS path at runtime\n"); | ||
| return -1; | ||
| } | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Rollback mcp_fts_path on reset failure to keep config consistent.
set_variable("fts_path") updates the stored path before calling reset_fts_path. If reset fails, the function returns -1 but leaves the new path saved, so config no longer reflects the active FTS DB. Preserve the previous value and restore on failure.
🔧 Preserve previous value on failure
- if (variables.mcp_fts_path)
- free(variables.mcp_fts_path);
- variables.mcp_fts_path = strdup(value);
- // Apply at runtime by resetting FTS in the existing handler
- if (mysql_tool_handler) {
- proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
- if (!mysql_tool_handler->reset_fts_path(value)) {
- proxy_error("Failed to reset FTS path at runtime\n");
- return -1;
- }
- }
-
- return 0;
+ char* old = variables.mcp_fts_path;
+ variables.mcp_fts_path = strdup(value);
+ // Apply at runtime by resetting FTS in the existing handler
+ if (mysql_tool_handler) {
+ proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
+ if (!mysql_tool_handler->reset_fts_path(value)) {
+ proxy_error("Failed to reset FTS path at runtime\n");
+ free(variables.mcp_fts_path);
+ variables.mcp_fts_path = old;
+ return -1;
+ }
+ }
+ if (old) free(old);
+ return 0;🤖 Prompt for AI Agents
In `@lib/MCP_Thread.cpp` around lines 350 - 364, The code assigns
variables.mcp_fts_path = strdup(value) before calling
mysql_tool_handler->reset_fts_path, so on reset failure the stored path is left
inconsistent; change the logic in the "fts_path" branch to preserve the previous
value: save the old pointer (old = variables.mcp_fts_path), create a new copy
(new_val = strdup(value)) but do not overwrite variables.mcp_fts_path until
reset succeeds; call mysql_tool_handler->reset_fts_path(value) and on success
assign variables.mcp_fts_path = new_val (free old if non-null), on failure free
new_val and leave variables.mcp_fts_path as old, log the error and return -1;
ensure no leaks or double-frees when old is null.
| std::string MySQL_FTS::sanitize_name(const std::string& name) { | ||
| const size_t MAX_NAME_LEN = 100; | ||
| std::string sanitized; | ||
| // Allowlist: only ASCII letters, digits, underscore | ||
| for (char c : name) { | ||
| if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || | ||
| (c >= '0' && c <= '9') || c == '_') { | ||
| sanitized.push_back(c); | ||
| } | ||
| } | ||
| // Prevent leading digit (SQLite identifiers can't start with digit) | ||
| if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') { | ||
| sanitized.insert(sanitized.begin(), '_'); | ||
| } | ||
| // Enforce maximum length | ||
| if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN); | ||
| return sanitized; | ||
| } |
There was a problem hiding this comment.
sanitize_name() may return empty string causing issues.
If the input contains only invalid characters, sanitize_name() returns an empty string. This could lead to invalid table names like fts_data__ or trigger name collisions.
🔧 Suggested fix
std::string MySQL_FTS::sanitize_name(const std::string& name) {
const size_t MAX_NAME_LEN = 100;
std::string sanitized;
// Allowlist: only ASCII letters, digits, underscore
for (char c : name) {
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9') || c == '_') {
sanitized.push_back(c);
}
}
+ // Return fallback if empty
+ if (sanitized.empty()) {
+ sanitized = "_unnamed";
+ }
// Prevent leading digit (SQLite identifiers can't start with digit)
- if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
+ if (sanitized[0] >= '0' && sanitized[0] <= '9') {
sanitized.insert(sanitized.begin(), '_');
}
// Enforce maximum length
if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
return sanitized;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::string MySQL_FTS::sanitize_name(const std::string& name) { | |
| const size_t MAX_NAME_LEN = 100; | |
| std::string sanitized; | |
| // Allowlist: only ASCII letters, digits, underscore | |
| for (char c : name) { | |
| if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || | |
| (c >= '0' && c <= '9') || c == '_') { | |
| sanitized.push_back(c); | |
| } | |
| } | |
| // Prevent leading digit (SQLite identifiers can't start with digit) | |
| if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') { | |
| sanitized.insert(sanitized.begin(), '_'); | |
| } | |
| // Enforce maximum length | |
| if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN); | |
| return sanitized; | |
| } | |
| std::string MySQL_FTS::sanitize_name(const std::string& name) { | |
| const size_t MAX_NAME_LEN = 100; | |
| std::string sanitized; | |
| // Allowlist: only ASCII letters, digits, underscore | |
| for (char c : name) { | |
| if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || | |
| (c >= '0' && c <= '9') || c == '_') { | |
| sanitized.push_back(c); | |
| } | |
| } | |
| // Return fallback if empty | |
| if (sanitized.empty()) { | |
| sanitized = "_unnamed"; | |
| } | |
| // Prevent leading digit (SQLite identifiers can't start with digit) | |
| if (sanitized[0] >= '0' && sanitized[0] <= '9') { | |
| sanitized.insert(sanitized.begin(), '_'); | |
| } | |
| // Enforce maximum length | |
| if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN); | |
| return sanitized; | |
| } |
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 92 - 109, sanitize_name can return an empty
string when input has only invalid characters; update MySQL_FTS::sanitize_name
to guarantee a non-empty, collision-resistant identifier by adding a
deterministic fallback: after sanitization and truncation (use MAX_NAME_LEN), if
result is empty or would collide, prepend or replace with a safe prefix (e.g.,
"_") and append a short unique suffix derived from the original name (use
std::hash<std::string> of the original input truncated to a few hex chars) so
symbols like sanitize_name and MAX_NAME_LEN are used but the function always
returns a valid, bounded-length, and effectively unique name.
| // Build MySQL query to fetch data | ||
| std::ostringstream mysql_query; | ||
| mysql_query << "SELECT "; | ||
| for (size_t i = 0; i < selected_cols.size(); i++) { | ||
| if (i > 0) mysql_query << ", "; | ||
| mysql_query << "`" << selected_cols[i] << "`"; | ||
| } | ||
|
|
||
| mysql_query << " FROM `" << schema << "`.`" << table << "`"; | ||
|
|
||
| if (!where_clause.empty()) { | ||
| mysql_query << " WHERE " << where_clause; | ||
| } |
There was a problem hiding this comment.
SQL injection risk: User-provided where_clause directly interpolated into query.
The where_clause parameter is directly concatenated into the MySQL query without validation or escaping (line 337). While the PR objectives mention security hardening, this vector remains open.
An attacker could inject arbitrary SQL through the where_clause parameter passed to fts_index_table.
🔒 Suggested mitigation
Consider one of these approaches:
- Validate that
where_clausecontains only safe SQL predicates - Use parameterized queries if the MySQL client library supports it
- At minimum, document the security implications and ensure this is only exposed to trusted callers
+ // WARNING: where_clause is passed directly to MySQL - ensure caller validates input
if (!where_clause.empty()) {
+ // Basic sanity check - reject obviously dangerous patterns
+ std::string upper_where = where_clause;
+ std::transform(upper_where.begin(), upper_where.end(), upper_where.begin(), ::toupper);
+ if (upper_where.find("INTO OUTFILE") != std::string::npos ||
+ upper_where.find("LOAD_FILE") != std::string::npos) {
+ result["error"] = "Dangerous pattern in where_clause";
+ return result.dump();
+ }
mysql_query << " WHERE " << where_clause;
}🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 326 - 338, The code in MySQL_FTS.cpp
concatenates the user-provided where_clause directly into the SQL built in the
fts_index_table flow (see mysql_query construction using selected_cols, schema,
table and where_clause), creating an SQL injection risk; fix it by changing the
query construction to use parameterized/prepared statements from your MySQL
client API (build the SELECT with column/backtick quoting as now, but replace
dynamic predicates in where_clause with named or positional placeholders and
bind the corresponding values), or if placeholders cannot represent arbitrary
predicates, enforce a strict whitelist/validation of allowed columns/operators
and parse/escape values before concatenation; locate the code that forms
mysql_query and refactor to use the client’s prepare/bind/execute API or add
validation logic for where_clause and document the expected safe format for
callers.
lib/MySQL_FTS.cpp
Outdated
| for (size_t i = 0; i < selected_cols.size(); i++) { | ||
| if (i > 0) mysql_query << ", "; | ||
| mysql_query << "`" << selected_cols[i] << "`"; | ||
| } | ||
|
|
||
| mysql_query << " FROM `" << schema << "`.`" << table << "`"; |
There was a problem hiding this comment.
Identifiers in MySQL query not properly escaped.
The column names from selected_cols and the schema/table names are wrapped in backticks but not escaped for embedded backticks. If a column name contains a backtick, this could cause SQL syntax errors or injection.
The code has escape_identifier() for SQLite but doesn't apply equivalent escaping for MySQL queries.
🔧 Suggested fix
Add a MySQL identifier escape helper or reuse the pattern from escape_identifier():
// Helper for MySQL identifiers (double backticks)
std::string escape_mysql_identifier(const std::string& id) {
std::string escaped = "`";
for (char c : id) {
escaped.push_back(c);
if (c == '`') escaped.push_back('`');
}
escaped.push_back('`');
return escaped;
}Then use it:
for (size_t i = 0; i < selected_cols.size(); i++) {
if (i > 0) mysql_query << ", ";
- mysql_query << "`" << selected_cols[i] << "`";
+ mysql_query << escape_mysql_identifier(selected_cols[i]);
}
- mysql_query << " FROM `" << schema << "`.`" << table << "`";
+ mysql_query << " FROM " << escape_mysql_identifier(schema) << "." << escape_mysql_identifier(table);🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 329 - 334, The MySQL identifier usage in the
selected_cols loop and the final FROM clause is quoting names with backticks but
not escaping embedded backticks; add an escape helper (e.g.,
escape_mysql_identifier) mirroring escape_identifier() behavior that duplicates
any internal '`' and wraps the result in surrounding backticks, then replace
usages building mysql_query for selected_cols, schema and table to call
escape_mysql_identifier(selected_cols[i]), escape_mysql_identifier(schema), and
escape_mysql_identifier(table) so all identifiers are safely escaped.
| db->rdlock(); | ||
| indexes_result = db->execute_statement(indexes_sql.str().c_str(), &error, &cols, &affected); | ||
|
|
||
| if (!indexes_result || indexes_result->rows.empty()) { | ||
| db->rdunlock(); | ||
| result["success"] = true; | ||
| result["query"] = query; | ||
| result["total_matches"] = 0; | ||
| result["results"] = json::array(); | ||
| return result.dump(); | ||
| } |
There was a problem hiding this comment.
Memory leak: indexes_result not freed before early return.
If indexes_result is non-null but has empty rows, the code returns early (line 492) without freeing indexes_result.
🐛 Proposed fix
if (!indexes_result || indexes_result->rows.empty()) {
db->rdunlock();
+ if (indexes_result) delete indexes_result;
result["success"] = true;
result["query"] = query;
result["total_matches"] = 0;
result["results"] = json::array();
return result.dump();
}🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 483 - 493, The early-return path that handles
(!indexes_result || indexes_result->rows.empty()) leaks memory when
indexes_result is non-null; modify that branch so that if indexes_result is
non-null you free/delete/cleanup it (e.g., delete indexes_result or call its
provided free method) before returning, and keep db->rdunlock() in the correct
order (free indexes_result if non-null, then rdunlock(), then set result fields
and return). Ensure you reference and handle the indexes_result variable from
the execute_statement call.
| # Extract JSON field value | ||
| extract_json_field() { | ||
| local response="$1" | ||
| local field="$2" | ||
|
|
||
| if command -v jq >/dev/null 2>&1; then | ||
| echo "${response}" | jq -r "${field}" 2>/dev/null || echo "" | ||
| else | ||
| # Fallback to grep/sed for basic JSON parsing | ||
| echo "${response}" | grep -o "\"${field}\"[[:space:]]*:[[:space:]]*\"[^\"]*\"" | sed 's/.*: "\(.*\)"/\1/' || echo "" | ||
| fi | ||
| } | ||
|
|
||
| # Check JSON boolean field | ||
| check_json_bool() { | ||
| local response="$1" | ||
| local field="$2" | ||
| local expected="$3" | ||
|
|
||
| # Extract inner result from double-nested structure | ||
| local inner_result | ||
| inner_result=$(extract_inner_result "${response}") | ||
|
|
||
| if command -v jq >/dev/null 2>&1; then | ||
| local actual | ||
| actual=$(echo "${inner_result}" | jq -r "${field}" 2>/dev/null) | ||
| [ "${actual}" = "${expected}" ] | ||
| else | ||
| # Fallback: check for true/false string | ||
| if [ "${expected}" = "true" ]; then | ||
| echo "${inner_result}" | grep -q "\"${field}\"[[:space:]]*:[[:space:]]*true" | ||
| else | ||
| echo "${inner_result}" | grep -q "\"${field}\"[[:space:]]*:[[:space:]]*false" | ||
| fi | ||
| fi | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the target lines and surrounding context
cat -n scripts/mcp/test_mcp_fts.sh | sed -n '200,250p'Repository: ProxySQL/proxysql-vec
Length of output: 186
🏁 Script executed:
# Also check the second location mentioned
cat -n scripts/mcp/test_mcp_fts.sh | sed -n '1280,1300p'Repository: ProxySQL/proxysql-vec
Length of output: 608
🏁 Script executed:
# Search for all calls to extract_json_field to see what field parameters are passed
rg "extract_json_field" scripts/mcp/test_mcp_fts.sh -A 2 -B 2Repository: ProxySQL/proxysql-vec
Length of output: 2143
🏁 Script executed:
# Search for all calls to check_json_bool to see what field parameters are passed
rg "check_json_bool" scripts/mcp/test_mcp_fts.sh -A 2 -B 2Repository: ProxySQL/proxysql-vec
Length of output: 3831
🏁 Script executed:
# Verify the actual implementation matches the snippet - read exact lines 205-240
cat -n scripts/mcp/test_mcp_fts.sh | sed -n '205,240p'Repository: ProxySQL/proxysql-vec
Length of output: 1463
🏁 Script executed:
# Check if there's an extract_inner_result function to understand the full context
rg "extract_inner_result" scripts/mcp/test_mcp_fts.sh -A 5Repository: ProxySQL/proxysql-vec
Length of output: 694
jq is de facto required but not enforced; the grep/sed fallback is non-functional and will silently fail tests.
The fallback JSON parsing in extract_json_field and check_json_bool cannot work with jq dot-notation fields (e.g., .success, .total_matches, .results | length). Line 214 and lines 235/237 search grep for literal "${field}", which becomes patterns like "\.success" that never match JSON keys like "success". Complex paths with pipes (.results | length, .indexes[] | select(...)) are unsupported entirely. Without jq, all extraction calls return empty strings silently, causing tests to fail or pass incorrectly. Add a pre-flight jq requirement check at lines 1284–1295, or redesign the fallback to strip leading dots and support basic path traversal.
🔧 Suggested pre-flight guard (make jq mandatory)
# Pre-flight checks
log_section "Pre-flight Checks"
+ if ! command -v jq >/dev/null 2>&1; then
+ log_error "jq is required for this test suite."
+ exit 1
+ fi
+
if ! check_mcp_server; then
log_error "MCP server check failed. Exiting."
exit 1
fi🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 205 - 240, The current fallback
JSON parsing in extract_json_field and check_json_bool is broken for jq-style
dot-notation and complex paths (e.g., ".success", ".results | length") and
silently fails; update the script to enforce jq presence at startup (fail fast
with a clear error) OR rewrite the fallback to canonicalize the field by
stripping leading dots and support simple dot-path traversal and array length
checks used by extract_inner_result; specifically modify the startup check to
call command -v jq and exit with an informative message if missing, and update
extract_json_field/check_json_bool to delegate to jq-only or to parse dot paths
by splitting on '.' and traversing nested objects/arrays so patterns like
.success and .results|length return correct values.
| # Check results | ||
| local total_matches | ||
| local result_count | ||
| total_matches=$(extract_json_field "${response}" ".total_matches") | ||
| result_count=$(extract_json_field "${response}" ".results | length") | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 97
🏁 Script executed:
sed -n '505,520p' scripts/mcp/test_mcp_fts.sh
sed -n '535,550p' scripts/mcp/test_mcp_fts.sh
sed -n '565,575p' scripts/mcp/test_mcp_fts.sh
sed -n '590,605p' scripts/mcp/test_mcp_fts.sh
sed -n '690,705p' scripts/mcp/test_mcp_fts.sh
sed -n '773,785p' scripts/mcp/test_mcp_fts.sh
sed -n '850,865p' scripts/mcp/test_mcp_fts.sh
sed -n '930,945p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 3718
🏁 Script executed:
grep -n "^extract_json_field\|^extract_inner_field\|^extract_inner_result" scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 148
🏁 Script executed:
rg "function extract_json_field|function extract_inner_field|function extract_inner_result" scripts/mcp/ -A 5Repository: ProxySQL/proxysql-vec
Length of output: 47
🏁 Script executed:
rg "extract_json_field|extract_inner_field|extract_inner_result" scripts/mcp/ -B 2 -A 2 | head -100Repository: ProxySQL/proxysql-vec
Length of output: 5618
🏁 Script executed:
rg "^fts_tool_call\(|function fts_tool_call" scripts/mcp/ -A 10Repository: ProxySQL/proxysql-vec
Length of output: 573
🏁 Script executed:
sed -n '500,520p' scripts/mcp/test_mcp_fts.sh
sed -n '530,555p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 1494
🏁 Script executed:
sed -n '156,195p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 942
🏁 Script executed:
rg "check_json_bool" scripts/mcp/test_mcp_fts.sh -B 2 -A 5 | head -80Repository: ProxySQL/proxysql-vec
Length of output: 1886
🏁 Script executed:
sed -n '200,280p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 2369
🏁 Script executed:
sed -n '305,330p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 1070
🏁 Script executed:
sed -n '690,710p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 732
🏁 Script executed:
sed -n '850,870p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 663
🏁 Script executed:
sed -n '775,785p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 426
🏁 Script executed:
sed -n '930,945p' scripts/mcp/test_mcp_fts.shRepository: ProxySQL/proxysql-vec
Length of output: 551
🏁 Script executed:
# Let's also verify instances that correctly use extract_inner_field to see the pattern
rg "extract_inner_field.*total_matches|extract_inner_field.*row_count" scripts/mcp/test_mcp_fts.sh -B 3 -A 1Repository: ProxySQL/proxysql-vec
Length of output: 588
Use extract_inner_field consistently for MCP tool result fields.
Tool responses from fts_tool_call are nested in the MCP JSON-RPC structure. While check_json_bool correctly unwraps them via extract_inner_result, the flagged lines extract other fields like .total_matches and .row_count using extract_json_field, which operates on the wrapped response and yields empty/null values. This causes numeric comparisons to fail silently.
Replace extract_json_field with extract_inner_field at these locations:
- Lines 509-514:
.total_matches,.results | length - Lines 541-543:
.total_matches - Lines 568-570:
.total_matches - Lines 596-598:
.total_matches - Lines 695-698:
.total_matches - Lines 778-779:
.row_count - Lines 855-857:
.rebuilt_count - Lines 935-936:
.total_matches
🔧 Example fix (apply similarly to all flagged locations)
- total_matches=$(extract_json_field "${response}" ".total_matches")
- result_count=$(extract_json_field "${response}" ".results | length")
+ total_matches=$(extract_inner_field "${response}" ".total_matches")
+ result_count=$(extract_inner_field "${response}" ".results | length")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Check results | |
| local total_matches | |
| local result_count | |
| total_matches=$(extract_json_field "${response}" ".total_matches") | |
| result_count=$(extract_json_field "${response}" ".results | length") | |
| # Check results | |
| local total_matches | |
| local result_count | |
| total_matches=$(extract_inner_field "${response}" ".total_matches") | |
| result_count=$(extract_inner_field "${response}" ".results | length") |
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 509 - 514, The MCP tool responses
are wrapped; replace uses of extract_json_field with extract_inner_field when
pulling numeric fields from the wrapped fts_tool_call response (e.g., change
total_matches, results | length, row_count, rebuilt_count extractions). Locate
occurrences where extract_json_field is used alongside variables like
total_matches, result_count, row_count, rebuilt_count (and where check_json_bool
already uses extract_inner_result) and swap to extract_inner_field so the code
unwraps the MCP JSON-RPC result before evaluating numeric comparisons; keep the
same variable names and call sites (e.g., where response is parsed after
fts_tool_call and in routines that follow check_json_bool).
| anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so | ||
| $(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@ | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and check the specific lines
echo "=== Checking Makefile lines 171-173 ==="
sed -n '171,173p' test/tap/tests/Makefile
echo ""
echo "=== Checking Makefile lines 196 (and surrounding context) ==="
sed -n '194,198p' test/tap/tests/Makefile
echo ""
echo "=== Checking which spelling appears in the Makefile ==="
grep -n "EXCLUDE_TRACKING_VAR" test/tap/tests/MakefileRepository: ProxySQL/proxysql-vec
Length of output: 1906
🏁 Script executed:
# Search the codebase for both spellings
echo "=== Checking for EXCLUDE_TRACKING_VARAIABLES (misspelled) ==="
rg "EXCLUDE_TRACKING_VARAIABLES" --type cpp --type h
echo ""
echo "=== Checking for EXCLUDE_TRACKING_VARIABLES (correct) ==="
rg "EXCLUDE_TRACKING_VARIABLES" --type cpp --type hRepository: ProxySQL/proxysql-vec
Length of output: 372
Fix typo in preprocessor define: EXCLUDE_TRACKING_VARAIABLES should be EXCLUDE_TRACKING_VARIABLES.
Line 172 uses -DEXCLUDE_TRACKING_VARAIABLES (misspelled), while the codebase defines and checks for -DEXCLUDE_TRACKING_VARIABLES (correct spelling). This typo means the intended preprocessor guard won't work as the code never references the misspelled symbol.
Suggested fix
anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
- $(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@
+ $(CXX) -DEXCLUDE_TRACKING_VARIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so | |
| $(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@ | |
| anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so | |
| $(CXX) -DEXCLUDE_TRACKING_VARIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@ |
🤖 Prompt for AI Agents
In `@test/tap/tests/Makefile` around lines 171 - 173, The Makefile target
anomaly_detection-t currently passes the misspelled preprocessor define
-DEXCLUDE_TRACKING_VARAIABLES; update that flag to the correctly spelled
-DEXCLUDE_TRACKING_VARIABLES in the anomaly_detection-t rule so the code that
checks for EXCLUDE_TRACKING_VARIABLES is enabled as intended.
|
|
||
| ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size); | ||
| diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count()); | ||
| diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count()); |
There was a problem hiding this comment.
Incorrect format specifier may cause undefined behavior.
std::chrono::milliseconds::rep is typically int64_t/long long, not long. Using %ld is incorrect on LLP64 platforms (e.g., Windows 64-bit) where long is 32 bits. This also creates inconsistency with line 332 which still uses %lld for similar timing output.
Suggested fix: revert to %lld or cast explicitly
- diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
+ diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count()); | |
| diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count())); |
🤖 Prompt for AI Agents
In `@test/tap/tests/vector_db_performance-t.cpp` at line 325, The diag call uses
the wrong printf format specifier for insert_duration.count(); update the format
to safely match the underlying type by either using "%lld" and casting
insert_duration.count() to (long long) or by casting to long and using "%ld"
consistently; change the diag invocation that prints "Time to insert %zu
entries: %ld ms" to use a matching specifier and cast (e.g., "%zu entries: %lld
ms", (long long)insert_duration.count()) so the format matches
insert_duration.count() and aligns with the other timing output.
| diag("Average time per concurrent operation: %lld microseconds", avg_time); | ||
| diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count()); | ||
| diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count()); |
There was a problem hiding this comment.
Inconsistent and incorrect format specifier.
Line 380 correctly uses %lld for avg_time (long long), but line 381 uses %ld for total_duration.count() which also returns a 64-bit integer type. This inconsistency and type mismatch should be corrected.
Suggested fix
diag("Average time per concurrent operation: %lld microseconds", avg_time);
- diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());
+ diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count()));🤖 Prompt for AI Agents
In `@test/tap/tests/vector_db_performance-t.cpp` around lines 380 - 381, The
second diagnostic uses the wrong/ inconsistent printf format for
total_duration.count(); change the format to match the 64-bit type (use %lld)
and ensure you cast total_duration.count() to (long long) in the diag call so
both diag("Average time per concurrent operation: %lld microseconds", avg_time)
and diag("Total time for %d operations: %lld microseconds", num_operations,
(long long)total_duration.count()) use the same 64-bit specifier and types;
locate these calls (diag with avg_time and diag with total_duration.count()) and
update the format/specifier and cast accordingly.
There was a problem hiding this comment.
Pull request overview
This PR addresses security, memory safety, and thread safety issues identified in PR #21 code reviews for the Full-Text Search (FTS) functionality. It also includes additional cherry-picked fixes for test compilation issues.
Changes:
- Security fixes: SQL injection prevention via identifier escaping and FTS5 query wrapping, allowlist-based name sanitization
- Memory safety fixes: VLA replacement with std::vector, memory leak fixes for error strings
- Thread safety fixes: mutex protection for FTS pointer swaps and runtime path changes
- Test improvements: format specifier corrections, missing includes, build rule additions
- New comprehensive test scripts for FTS functionality validation
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/tap/tests/vector_db_performance-t.cpp | Fixed printf format specifiers for duration types, added missing includes |
| test/tap/tests/anomaly_detection-t.cpp | Added stub function for SQLite3_Server_session_handler |
| test/tap/tests/ai_llm_retry_scenarios-t.cpp | Added missing string header include |
| test/tap/tests/Makefile | Added build rule for anomaly_detection-t test |
| scripts/mcp/test_mcp_fts_detailed.sh | New detailed FTS validation script (has security issue) |
| scripts/mcp/test_mcp_fts.sh | New comprehensive FTS test script |
| lib/MySQL_FTS.cpp | New FTS implementation with security fixes |
| lib/MySQL_FTS.h | New FTS class header |
| lib/MySQL_Tool_Handler.cpp | FTS integration with mutex protection |
| lib/MySQL_Tool_Handler.h | FTS method declarations |
| lib/Query_Tool_Handler.cpp | FTS tool registration, debug statement removal |
| lib/ProxySQL_MCP_Server.cpp | FTS path configuration parameter |
| lib/MCP_Thread.cpp | FTS path variable handling |
| lib/MCP_Thread.h | FTS path variable declaration |
| lib/Makefile | Added MySQL_FTS.oo to build |
| include/MySQL_Tool_Handler.h | FTS declarations and method signatures |
| include/MySQL_FTS.h | Complete FTS class interface |
| include/MCP_Thread.h | FTS path variable in struct |
| doc/MCP/FTS_USER_GUIDE.md | Comprehensive FTS user guide |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| diag("Average time per concurrent operation: %lld microseconds", avg_time); | ||
| diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count()); | ||
| diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count()); |
There was a problem hiding this comment.
The printf format specifier %lld is correct for long long, but std::chrono::duration::count() returns rep type which is implementation-defined. On some platforms it may be long rather than long long, causing format mismatch warnings. Consider using static_cast<long long>(total_duration.count()) for portability.
| diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count()); | |
| diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count())); |
|
|
||
| mysql_exec() { | ||
| local sql="$1" | ||
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}" |
There was a problem hiding this comment.
The MySQL password is exposed in the command line arguments using -p"${MYSQL_PASSWORD}". This is a security vulnerability as the password can be seen in process listings by other users. The script should use the MYSQL_PWD environment variable instead, which is already done in other parts of the script. Change this line to: MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"
| mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}" | |
| MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}" |
|
|
||
| if (error || !resultset || resultset->rows.empty()) { | ||
| result["error"] = "Index not found for " + schema + "." + table; | ||
| if (resultset) delete resultset; |
There was a problem hiding this comment.
Missing error cleanup in reindex function. If an error is set from execute_statement (line 726), it should be freed before returning. Add if (error) (*proxy_sqlite3_free)(error); after line 730 and before returning the error result.
| if (resultset) delete resultset; | |
| if (resultset) delete resultset; | |
| if (error) (*proxy_sqlite3_free)(error); |
| std::string MySQL_FTS::sanitize_name(const std::string& name) { | ||
| const size_t MAX_NAME_LEN = 100; | ||
| std::string sanitized; | ||
| // Allowlist: only ASCII letters, digits, underscore | ||
| for (char c : name) { | ||
| if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || | ||
| (c >= '0' && c <= '9') || c == '_') { | ||
| sanitized.push_back(c); | ||
| } | ||
| } | ||
| // Prevent leading digit (SQLite identifiers can't start with digit) | ||
| if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') { | ||
| sanitized.insert(sanitized.begin(), '_'); | ||
| } | ||
| // Enforce maximum length | ||
| if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN); | ||
| return sanitized; | ||
| } |
There was a problem hiding this comment.
The sanitize_name function could return an empty string if the input contains no valid characters. An empty sanitized name could lead to issues when constructing table names. Consider adding validation to ensure the sanitized name is not empty, or prepend a default prefix like "fts_" when the result is empty.
|
|
||
| ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size); | ||
| diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count()); | ||
| diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count()); |
There was a problem hiding this comment.
The printf format specifier %lld is correct for long long, but std::chrono::duration::count() returns rep type which is implementation-defined. On some platforms it may be long rather than long long, causing format mismatch warnings. Consider using static_cast<long long>(insert_duration.count()) for portability.
| diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count()); | |
| diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count())); |
This commit addresses all recommendations from CodeRabbit, Gemini Code Assist, and Copilot for PR #25 (FTS security and code quality improvements). Critical Security Fixes: - MCP_Thread.cpp: Rollback fts_path on reset failure to keep config consistent - MySQL_FTS.cpp: Add escape_mysql_identifier() for MySQL query identifier escaping - MySQL_FTS.cpp: Add unique hash-based fallback to sanitize_name() for empty strings - MySQL_FTS.cpp: Add where_clause validation to block dangerous SQL patterns Memory Safety Fixes: - MySQL_FTS.cpp: Fix indexes_result memory leak on early return in search() - MySQL_FTS.h: Delete copy/move operations to prevent accidental resource duplication Thread Safety Documentation: - MySQL_Tool_Handler.cpp: Add comment explaining FTS lock design rationale Test Script Improvements: - test_mcp_fts.sh: Add curl timeouts (5s connect, 30s max) - test_mcp_fts.sh: Remove unused delete_response variable - test_mcp_fts_detailed.sh: Make cleanup tolerant of non-existent indexes Build Fixes: - Makefile: Fix EXCLUDE_TRACKING_VARAIABLES typo to EXCLUDE_TRACKING_VARIABLES - vector_db_performance-t.cpp: Fix printf format specifiers to %lld with cast Schema Fixes: - Query_Tool_Handler.cpp: Change fts_index_table columns schema from string to array Code Cleanup: - MySQL_Tool_Handler.cpp: Remove all remaining debug fprintf statements (34 lines) - Documentation: Change "Full Text" to "Full-Text" (hyphenated) Total: ~50 fixes across 10 files
Summary
This PR addresses all recommendations from 3 AI code reviewers (CodeRabbit, Gemini Code Assist, Copilot) for PR #21 which added Full-Text Search (FTS) functionality. It also includes additional cherry-picked fixes unrelated to PR #21.
PR #21 Review Fixes
Security Fixes (Critical)
escape_identifier()helper that doubles backticks for proper SQLite identifier escaping. Applied to all DDL statements (CREATE TABLE, CREATE VIRTUAL TABLE, triggers).OR,AND,NOT,NEAR,*,"",()).Memory Safety Fixes (High)
std::vectorinMySQL_FTS::init(), addeddelete db;on error path to prevent leak.(*proxy_sqlite3_free)(error);before return inlist_indexes().reindex_json["error"]toreindex_json.value("error", std::string("unknown error")).Thread Safety Fixes (High)
fts->init()call outside lock, only swap pointer under lock.Code Cleanup
fprintfstatements fromQuery_Tool_Handler.cpp#include <memory>fromMySQL_FTS.hTest Script Security Fixes
-p"${MYSQL_PASSWORD}"toMYSQL_PWD="${MYSQL_PASSWORD}" mysqlto prevent password in process listescape_sql()shell function and applied to INSERT statementcurl ${CURL_OPTS:+"${CURL_OPTS}"}FTS_INDEX_NAMEandSEARCH_QUERIESDocumentation Fixes
text,bash)Files Modified
include/MySQL_FTS.hlib/MySQL_FTS.cpplib/MySQL_Tool_Handler.cpplib/Query_Tool_Handler.cppdoc/MCP/FTS_USER_GUIDE.mdscripts/mcp/test_mcp_fts.shscripts/mcp/test_mcp_fts_detailed.shAdditional Fixes
This PR also includes additional cherry-picked fixes unrelated to PR #21.
Verification
cd lib && make clean && make✅bash scripts/mcp/test_mcp_fts.shCloses #21