Skip to content

Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes#25

Merged
renecannao merged 9 commits intov3.1-vecfrom
v3.1-vec_21
Jan 21, 2026
Merged

Fix PR #21 Review: Security, Memory Safety, Thread Safety + Additional Fixes#25
renecannao merged 9 commits intov3.1-vecfrom
v3.1-vec_21

Conversation

@renecannao
Copy link

Summary

This PR addresses all recommendations from 3 AI code reviewers (CodeRabbit, Gemini Code Assist, Copilot) for PR #21 which added Full-Text Search (FTS) functionality. It also includes additional cherry-picked fixes unrelated to PR #21.

PR #21 Review Fixes

Security Fixes (Critical)

  • SQL Injection - Table Name Escaping: Added escape_identifier() helper that doubles backticks for proper SQLite identifier escaping. Applied to all DDL statements (CREATE TABLE, CREATE VIRTUAL TABLE, triggers).
  • SQL Injection - MATCH Clause FTS5 Operator: Wrapped FTS5 query in double quotes and escaped embedded quotes to prevent FTS5 operator injection (OR, AND, NOT, NEAR, *, "", ()).
  • sanitize_name() Allowlist Validation: Replaced blocklist with allowlist (only ASCII letters, digits, underscore) with max length enforcement.

Memory Safety Fixes (High)

  • VLA Replacement: Replaced VLA with std::vector in MySQL_FTS::init(), added delete db; on error path to prevent leak.
  • Error String Leak: Added (*proxy_sqlite3_free)(error); before return in list_indexes().
  • Potential Exception: Changed reindex_json["error"] to reindex_json.value("error", std::string("unknown error")).

Thread Safety Fixes (High)

  • reinit_fts() Missing Lock: Added mutex lock around pointer swap to prevent race condition.
  • reset_fts_path() Blocking I/O: Moved blocking fts->init() call outside lock, only swap pointer under lock.

Code Cleanup

  • Removed 7 debug fprintf statements from Query_Tool_Handler.cpp
  • Removed unused #include <memory> from MySQL_FTS.h

Test Script Security Fixes

  • MySQL Password Exposure: Changed from -p"${MYSQL_PASSWORD}" to MYSQL_PWD="${MYSQL_PASSWORD}" mysql to prevent password in process list
  • SQL Injection in INSERT: Added escape_sql() shell function and applied to INSERT statement
  • CURL_OPTS Quoting: Fixed to curl ${CURL_OPTS:+"${CURL_OPTS}"}
  • Unused Variables: Removed FTS_INDEX_NAME and SEARCH_QUERIES

Documentation Fixes

  • Fixed bare URL to markdown link format
  • Added code block language identifiers (text, bash)

Files Modified

  • include/MySQL_FTS.h
  • lib/MySQL_FTS.cpp
  • lib/MySQL_Tool_Handler.cpp
  • lib/Query_Tool_Handler.cpp
  • doc/MCP/FTS_USER_GUIDE.md
  • scripts/mcp/test_mcp_fts.sh
  • scripts/mcp/test_mcp_fts_detailed.sh

Additional Fixes

This PR also includes additional cherry-picked fixes unrelated to PR #21.

Verification

  • Build: cd lib && make clean && make
  • Ready for testing with: bash scripts/mcp/test_mcp_fts.sh

Closes #21

rahim-kanji and others added 8 commits January 20, 2026 14:19
Implement BM25-ranked full-text search capability for MySQL/MariaDB tables using SQLite-based external FTS index.

Changes:
- Add MySQL_FTS class for managing SQLite FTS indexes
- Add FTS tools: fts_index_table, fts_search, fts_reindex, fts_delete_index, fts_list_indexes, fts_rebuild_all
- Add thread-safe FTS lifecycle management with fts_lock mutex
- Add reset_fts_path() for runtime FTS database path configuration
- Add comprehensive FTS test scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh)
Add full-text search (FTS) tools to MCP query server
- Add special build rule in Makefile for anomaly_detection-t that includes:
  - $(OBJ) for global variables (GloVars, GloGATH)
  - -Wl,--allow-multiple-definition to allow test's main() to override ProxySQL's
  - ClickHouse client libraries (libclickhouse-cpp-lib.a, libzstdstatic.a, liblz4.a)
  - SQLite rembed library (libsqlite_rembed.a)
  - -lscram -lusual for PostgreSQL SCRAM support
- Add stub function SQLite3_Server_session_handler required by SQLite3_Server.cpp

Resolves compilation errors for anomaly_detection-t test.
- Add #include <string> for C++ std::string support
- Add #include <cmath> for sqrt() function
- Change format %lld to %ld for chrono duration types (long int, not long long)

Resolves compilation errors for vector_db_performance-t test.
- Add #include <string> for C++ std::string support

Resolves compilation errors for ai_llm_retry_scenarios-t test.
…leanup

Security fixes:
- Add escape_identifier() helper for proper SQLite identifier escaping
- Replace sanitize_name() with allowlist validation (ASCII letters, digits, underscore only)
- Fix MATCH clause FTS5 operator injection by wrapping query in double quotes
- Apply escape_identifier() to all DDL statements (CREATE, DROP, triggers)

Memory safety fixes:
- Replace VLA with std::vector in MySQL_FTS::init(), add delete on error path
- Fix memory leak: free error string before return in list_indexes()
- Fix reindex_json["error"] potential exception using .value() with default

Thread safety fixes:
- reinit_fts(): Add mutex lock around pointer swap
- reset_fts_path(): Move blocking init() outside lock, only swap pointer under lock

Code cleanup:
- Remove 7 debug fprintf statements from Query_Tool_Handler.cpp
- Remove unused #include <memory> from MySQL_FTS.h

Test script security fixes:
- Use MYSQL_PWD environment variable instead of -p"..." for password
- Add escape_sql() function and apply to INSERT statement
- Fix CURL_OPTS quoting: ${CURL_OPTS:+"${CURL_OPTS}"}
- Remove unused FTS_INDEX_NAME and SEARCH_QUERIES variables

Documentation fixes:
- Fix bare URL to markdown link format
- Add code block language identifiers (text, bash)
@gemini-code-assist
Copy link

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the recently introduced Full-Text Search (FTS) functionality by addressing critical feedback from AI code reviewers. The primary focus is on bolstering the security, memory safety, and thread safety of the FTS implementation, ensuring a more robust and reliable system. Additionally, it integrates the FTS capabilities more deeply into the existing toolset, provides extensive user documentation, and enhances test coverage, making the feature more stable and usable.

Highlights

  • Security Enhancements: Implemented robust SQL injection prevention for table names and FTS5 MATCH clauses, and switched to an allowlist for sanitize_name() to prevent unexpected characters.
  • Memory Safety Improvements: Addressed potential memory leaks by replacing VLAs with std::vector, ensuring proper delete calls on error paths, and handling JSON exceptions gracefully.
  • Thread Safety Corrections: Introduced mutex locking for FTS reinitialization to prevent race conditions and refactored blocking I/O operations to occur outside critical locks.
  • Full-Text Search (FTS) Integration: Introduced a new MySQL_FTS class and integrated six FTS-related tools (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) into the MCP query handler.
  • Comprehensive FTS Documentation: Added a detailed FTS_USER_GUIDE.md covering overview, architecture, configuration, tool reference, usage examples, API endpoints, best practices, and troubleshooting.
  • Enhanced FTS Test Coverage: Added two new shell scripts (test_mcp_fts.sh, test_mcp_fts_detailed.sh) to thoroughly validate the FTS functionality, including runtime path changes and stress testing.
  • General Code Quality & Cleanup: Removed debug fprintf statements, unnecessary includes, and improved test script security by preventing MySQL password exposure and adding SQL escaping.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Full Text Search (FTS) system for MySQL data discovery with SQLite-backed indexing.
    • New FTS operations: index creation, searching with BM25 ranking, index listing, deletion, reindexing, and rebuild functionality.
    • Added FTS configuration support via runtime variables for dynamic path management.
  • Documentation

    • Added comprehensive FTS user guide covering setup, configuration, usage examples, API endpoints, best practices, and troubleshooting.
  • Tests

    • Added automated test suites for FTS validation and integration testing.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This PR introduces a complete Full-Text Search (FTS) system for MySQL tables using SQLite FTS5 backend. It adds a new MySQL_FTS class for index management, six new MCP tools (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all), thread-safe integration with MySQL_Tool_Handler, runtime-configurable FTS database path, and extensive test coverage with documentation.

Changes

Cohort / File(s) Summary
FTS Core Implementation
include/MySQL_FTS.h, lib/MySQL_FTS.cpp
New MySQL_FTS class implementing SQLite FTS5-backed indexing and search. Includes schema initialization, index table creation, data ingestion from MySQL, search with BM25 ranking, and index lifecycle management (create, list, delete, reindex, rebuild). Implements SQL escaping, identifier sanitization, and error handling with JSON result formatting.
MySQL_Tool_Handler Integration
include/MySQL_Tool_Handler.h, lib/MySQL_Tool_Handler.cpp
Extended constructor with new fts_path parameter. Added FTS member (fts) and mutex (fts_lock) for thread-safe access. Introduced public FTS API methods (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) and reinit_fts() for runtime FTS path reconfiguration. Moved execute_query from private to public.
MCP Configuration
include/MCP_Thread.h, lib/MCP_Thread.cpp
Added mcp_fts_path runtime-configurable variable. Constructor allocates storage; destructor frees it. Extended get_variable and set_variable to handle "fts_path" with runtime application via reset_fts_path() on the MySQL_Tool_Handler.
Query Tool Dispatch
lib/Query_Tool_Handler.cpp
Added six FTS tool definitions (fts_index_table, fts_search, fts_list_indexes, fts_delete_index, fts_reindex, fts_rebuild_all) with parameter specifications and descriptions. Extended execute_tool to dispatch FTS operations with parameter extraction and delegation to mysql_handler methods. Removed several debug logging statements.
MCP Server Wiring
lib/ProxySQL_MCP_Server.cpp
Updated MySQL_Tool_Handler constructor call to pass mcp_fts_path as an additional parameter after mcp_catalog_path.
Build System
lib/Makefile
Added MySQL_FTS.oo to the _OBJ_CXX list for compilation into libproxysql.
Documentation
doc/MCP/FTS_USER_GUIDE.md
New comprehensive user guide covering FTS overview, architecture, data flow, admin configuration, quick-start usage, tool reference, API endpoints, best practices, troubleshooting, test scripts, FTS5 syntax appendix, and version history.
FTS Test Suites
scripts/mcp/test_mcp_fts.sh, scripts/mcp/test_mcp_fts_detailed.sh
Two comprehensive Bash test scripts validating FTS functionality. Basic suite includes 21+ tests covering index creation, listing, searching (simple/phrase/cross-table/ranking/pagination), deletion, reindexing, and rebuild. Detailed suite tests runtime configuration changes, tool availability, snippet highlighting, and empty query handling.
Test Infrastructure
test/tap/tests/Makefile, test/tap/tests/anomaly_detection-t.cpp, test/tap/tests/ai_llm_retry_scenarios-t.cpp, test/tap/tests/vector_db_performance-t.cpp
Added anomaly_detection-t build target. Added forward declarations and stub functions (SQLite3_Server_session_handler) in anomaly_detection-t.cpp. Added include in ai_llm_retry_scenarios-t.cpp. Fixed formatting specifiers from %lld to %ld and added include in vector_db_performance-t.cpp.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant MySQL_Tool_Handler
    participant MySQL_FTS
    participant SQLite DB
    participant MySQL

    Client->>MySQL_Tool_Handler: fts_index_table(schema, table, columns, pk, where)
    MySQL_Tool_Handler->>MySQL_FTS: index_table(schema, table, columns, pk, where)
    MySQL_FTS->>SQLite DB: check index exists
    alt Index exists
        SQLite DB-->>MySQL_FTS: true
    else Create new index
        MySQL_FTS->>SQLite DB: create data_table
        MySQL_FTS->>SQLite DB: create fts_table (FTS5)
        MySQL_FTS->>SQLite DB: create triggers for sync
    end
    MySQL_FTS->>MySQL: execute SELECT query
    MySQL-->>MySQL_FTS: result set
    MySQL_FTS->>SQLite DB: batch insert into data_table
    MySQL_FTS->>SQLite DB: update fts_indexes metadata
    SQLite DB-->>MySQL_FTS: success
    MySQL_FTS-->>MySQL_Tool_Handler: JSON result
    MySQL_Tool_Handler-->>Client: FTS index created
Loading
sequenceDiagram
    participant Client
    participant MySQL_Tool_Handler
    participant MySQL_FTS
    participant SQLite DB

    Client->>MySQL_Tool_Handler: fts_search(query, schema, table, limit, offset)
    MySQL_Tool_Handler->>MySQL_FTS: search(query, schema, table, limit, offset)
    MySQL_FTS->>SQLite DB: retrieve index metadata
    SQLite DB-->>MySQL_FTS: matching indexes
    loop For each matching index
        MySQL_FTS->>SQLite DB: FTS5 search with BM25 ranking
        SQLite DB-->>MySQL_FTS: ranked results with snippets
    end
    MySQL_FTS->>MySQL_FTS: aggregate & paginate results
    MySQL_FTS-->>MySQL_Tool_Handler: JSON results array
    MySQL_Tool_Handler-->>Client: search results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Whiskers twitching with delight,
SQLite indexes shining bright!
FTS searching hops so fast,
BM25 rankings unsurpassed!
MySQL tables indexed true,
MCP tools now see you through!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: addressing security, memory safety, and thread safety issues from PR #21 review plus additional fixes.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing security fixes, memory safety improvements, thread safety corrections, and documentation updates.
Linked Issues check ✅ Passed All PR #21 objectives are met: six FTS tools implemented, thread-safe operations with mutexes, configurable FTS path with reset support, and comprehensive test scripts. The PR also implements recommended security, memory, and thread-safety fixes from code reviewers.
Out of Scope Changes check ✅ Passed Most changes are in-scope fixes to PR #21 implementation. Minor out-of-scope changes include unrelated test file fixes (ai_llm_retry_scenarios-t.cpp, anomaly_detection-t.cpp, vector_db_performance-t.cpp, Makefile) and Makefile modifications, but these are explicitly mentioned as cherry-picked additional fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Full Text Search (FTS) module to ProxySQL, leveraging SQLite's FTS5 extension for fast, indexed searching of MySQL data. The architecture includes MCP query endpoint integration, a Query Tool Handler, and a MySQL Tool Handler wrapping the MySQL_FTS class. Configuration is managed via ProxySQL's admin interface, with variables for the FTS database path and MySQL backend connection details. The FTS Tools Reference details six tools for indexing, searching, and managing FTS indexes. The code review highlights several key aspects of the changes, including robust input sanitization and identifier escaping to prevent SQL injection vulnerabilities, thread-safe access to the FTS database, runtime configuration changes for the FTS database path, and the inclusion of comprehensive test cases. Additionally, there were format specifier corrections for printing long int values and a note about a future date in the version history.

Comment on lines +1047 to +1140
pthread_mutex_lock(&fts_lock);
if (!fts) {
json result;
result["success"] = false;
result["error"] = "FTS not initialized";
pthread_mutex_unlock(&fts_lock);
return result.dump();
}

std::string out = fts->index_table(schema, table, columns, primary_key, where_clause, this);
pthread_mutex_unlock(&fts_lock);
return out;
}

std::string MySQL_Tool_Handler::fts_search(
const std::string& query,
const std::string& schema,
const std::string& table,
int limit,
int offset
) {
pthread_mutex_lock(&fts_lock);
if (!fts) {
json result;
result["success"] = false;
result["error"] = "FTS not initialized";
pthread_mutex_unlock(&fts_lock);
return result.dump();
}

std::string out = fts->search(query, schema, table, limit, offset);
pthread_mutex_unlock(&fts_lock);
return out;
}

std::string MySQL_Tool_Handler::fts_list_indexes() {
pthread_mutex_lock(&fts_lock);
if (!fts) {
json result;
result["success"] = false;
result["error"] = "FTS not initialized";
pthread_mutex_unlock(&fts_lock);
return result.dump();
}

std::string out = fts->list_indexes();
pthread_mutex_unlock(&fts_lock);
return out;
}

std::string MySQL_Tool_Handler::fts_delete_index(const std::string& schema, const std::string& table) {
pthread_mutex_lock(&fts_lock);
if (!fts) {
json result;
result["success"] = false;
result["error"] = "FTS not initialized";
pthread_mutex_unlock(&fts_lock);
return result.dump();
}

std::string out = fts->delete_index(schema, table);
pthread_mutex_unlock(&fts_lock);
return out;
}

std::string MySQL_Tool_Handler::fts_reindex(const std::string& schema, const std::string& table) {
pthread_mutex_lock(&fts_lock);
if (!fts) {
json result;
result["success"] = false;
result["error"] = "FTS not initialized";
pthread_mutex_unlock(&fts_lock);
return result.dump();
}

std::string out = fts->reindex(schema, table, this);
pthread_mutex_unlock(&fts_lock);
return out;
}

std::string MySQL_Tool_Handler::fts_rebuild_all() {
pthread_mutex_lock(&fts_lock);
if (!fts) {
json result;
result["success"] = false;
result["error"] = "FTS not initialized";
pthread_mutex_unlock(&fts_lock);
return result.dump();
}

std::string out = fts->rebuild_all(this);
pthread_mutex_unlock(&fts_lock);
return out;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

All FTS tool wrapper functions (fts_index_table, fts_search, etc.) correctly acquire and release fts_lock before and after calling the underlying fts object methods. This ensures thread-safe access to the MySQL_FTS instance and its SQLite database.

Comment on lines 92 to 108
std::string MySQL_FTS::sanitize_name(const std::string& name) {
const size_t MAX_NAME_LEN = 100;
std::string sanitized;
// Allowlist: only ASCII letters, digits, underscore
for (char c : name) {
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9') || c == '_') {
sanitized.push_back(c);
}
}
// Prevent leading digit (SQLite identifiers can't start with digit)
if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
sanitized.insert(sanitized.begin(), '_');
}
// Enforce maximum length
if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
return sanitized;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The sanitize_name function implements a robust allowlist approach, preventing leading digits and enforcing a maximum length. This is a strong security measure against potential SQL injection or unexpected behavior when constructing dynamic table names in SQLite.

Comment on lines +111 to +120
std::string MySQL_FTS::escape_identifier(const std::string& identifier) {
std::string escaped;
escaped.reserve(identifier.length() * 2 + 2);
escaped.push_back('`');
for (char c : identifier) {
escaped.push_back(c);
if (c == '`') escaped.push_back('`'); // Double backticks
}
escaped.push_back('`');
return escaped;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The escape_identifier function correctly doubles backticks for SQLite identifier escaping. This is crucial for preventing SQL injection when dynamic table or column names are used in DDL statements.

Comment on lines +170 to +245
int MySQL_FTS::create_index_tables(const std::string& schema, const std::string& table) {
std::string data_table = get_data_table_name(schema, table);
std::string fts_table = get_fts_table_name(schema, table);
std::string escaped_data = escape_identifier(data_table);
std::string escaped_fts = escape_identifier(fts_table);

// Create data table
std::ostringstream create_data_sql;
create_data_sql << "CREATE TABLE IF NOT EXISTS " << escaped_data << " ("
" rowid INTEGER PRIMARY KEY AUTOINCREMENT,"
" schema_name TEXT NOT NULL,"
" table_name TEXT NOT NULL,"
" primary_key_value TEXT NOT NULL,"
" content TEXT NOT NULL,"
" metadata TEXT"
");";

if (!db->execute(create_data_sql.str().c_str())) {
proxy_error("Failed to create data table %s\n", data_table.c_str());
return -1;
}

// Create FTS5 virtual table with external content
std::ostringstream create_fts_sql;
create_fts_sql << "CREATE VIRTUAL TABLE IF NOT EXISTS " << escaped_fts << " USING fts5("
" content, metadata,"
" content=" << escaped_data << ","
" content_rowid='rowid',"
" tokenize='porter unicode61'"
");";

if (!db->execute(create_fts_sql.str().c_str())) {
proxy_error("Failed to create FTS table %s\n", fts_table.c_str());
return -1;
}

// Create triggers for automatic sync (populate the FTS table)
std::string base_name = sanitize_name(schema) + "_" + sanitize_name(table);
std::string escaped_base = escape_identifier(base_name);

// Drop existing triggers if any
db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_ai_" + base_name)).c_str());
db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_ad_" + base_name)).c_str());
db->execute(("DROP TRIGGER IF EXISTS " + escape_identifier("fts_au_" + base_name)).c_str());

// AFTER INSERT trigger
std::ostringstream ai_sql;
ai_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_ai_" + base_name)
<< " AFTER INSERT ON " << escaped_data << " BEGIN"
<< " INSERT INTO " << escaped_fts << "(rowid, content, metadata)"
<< " VALUES (new.rowid, new.content, new.metadata);"
<< "END;";
db->execute(ai_sql.str().c_str());

// AFTER DELETE trigger
std::ostringstream ad_sql;
ad_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_ad_" + base_name)
<< " AFTER DELETE ON " << escaped_data << " BEGIN"
<< " INSERT INTO " << escaped_fts << "(" << escaped_fts << ", rowid, content, metadata)"
<< " VALUES ('delete', old.rowid, old.content, old.metadata);"
<< "END;";
db->execute(ad_sql.str().c_str());

// AFTER UPDATE trigger
std::ostringstream au_sql;
au_sql << "CREATE TRIGGER IF NOT EXISTS " << escape_identifier("fts_au_" + base_name)
<< " AFTER UPDATE ON " << escaped_data << " BEGIN"
<< " INSERT INTO " << escaped_fts << "(" << escaped_fts << ", rowid, content, metadata)"
<< " VALUES ('delete', old.rowid, old.content, old.metadata);"
<< " INSERT INTO " << escaped_fts << "(rowid, content, metadata)"
<< " VALUES (new.rowid, new.content, new.metadata);"
<< "END;";
db->execute(au_sql.str().c_str());

return 0;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The create_index_tables function correctly uses escape_identifier for table names in CREATE TABLE and CREATE VIRTUAL TABLE statements, as well as for trigger names. This prevents SQL injection vulnerabilities in DDL operations.

Comment on lines +513 to +518
std::string fts_literal = "\"";
for (char c : query) {
fts_literal.push_back(c);
if (c == '"') fts_literal.push_back('"'); // Double quotes
}
fts_literal.push_back('"');

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The FTS5 query is correctly wrapped in double quotes and embedded double quotes are escaped by doubling them. This is an effective measure to prevent FTS5 operator injection, as described in the PR summary.

Comment on lines +1142 to +1179
int MySQL_Tool_Handler::reinit_fts(const std::string& fts_path) {
proxy_info("MySQL_Tool_Handler: Reinitializing FTS with path: %s\n", fts_path.c_str());

// Check if directory exists (SQLite can't create directories)
std::string::size_type last_slash = fts_path.find_last_of("/");
if (last_slash != std::string::npos && last_slash > 0) {
std::string dir = fts_path.substr(0, last_slash);
struct stat st;
if (stat(dir.c_str(), &st) != 0 || !S_ISDIR(st.st_mode)) {
proxy_error("MySQL_Tool_Handler: Directory does not exist for path '%s' (directory: '%s')\n",
fts_path.c_str(), dir.c_str());
return -1;
}
}

// First, test if we can open the new database (outside lock)
MySQL_FTS* new_fts = new MySQL_FTS(fts_path);
if (!new_fts) {
proxy_error("MySQL_Tool_Handler: Failed to create new FTS handler\n");
return -1;
}

if (new_fts->init() != 0) {
proxy_error("MySQL_Tool_Handler: Failed to initialize FTS at %s\n", fts_path.c_str());
delete new_fts;
return -1; // Return error WITHOUT closing old FTS
}

// Success! Now swap the pointer under lock
pthread_mutex_lock(&fts_lock);
MySQL_FTS* old_fts = fts;
fts = new_fts;
pthread_mutex_unlock(&fts_lock);
if (old_fts) delete old_fts;

proxy_info("MySQL_Tool_Handler: FTS reinitialized successfully at %s\n", fts_path.c_str());
return 0;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reinit_fts function includes a check for directory existence using stat() before attempting to create the new FTS database. This is a good practice to prevent errors if the specified path's parent directory does not exist or is not writable.


| Version | Date | Changes |
|---------|------|---------|
| 0.1.0 | 2025-01 | Initial implementation |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The version history lists a date in the future (2025-01). While this might be a placeholder, it's generally better to use a current or past date for initial documentation releases, or indicate it as a planned future release date explicitly.


if (error) {
result["error"] = "Failed to list indexes: " + std::string(error);
(*proxy_sqlite3_free)(error);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The proxy_sqlite3_free(error) call ensures that the error string allocated by SQLite is properly freed, preventing a memory leak in the list_indexes function's error path.


ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size);
diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count());
diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The format specifier for insert_duration.count() was changed from lld to ld. This is a correctness fix for printing long or long int values, ensuring proper output on different systems.


diag("Average time per concurrent operation: %lld microseconds", avg_time);
diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count());
diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The format specifier for total_duration.count() was changed from lld to ld. This is a correctness fix for printing long or long int values, ensuring proper output on different systems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@lib/MCP_Thread.cpp`:
- Around line 350-364: The code assigns variables.mcp_fts_path = strdup(value)
before calling mysql_tool_handler->reset_fts_path, so on reset failure the
stored path is left inconsistent; change the logic in the "fts_path" branch to
preserve the previous value: save the old pointer (old =
variables.mcp_fts_path), create a new copy (new_val = strdup(value)) but do not
overwrite variables.mcp_fts_path until reset succeeds; call
mysql_tool_handler->reset_fts_path(value) and on success assign
variables.mcp_fts_path = new_val (free old if non-null), on failure free new_val
and leave variables.mcp_fts_path as old, log the error and return -1; ensure no
leaks or double-frees when old is null.

In `@lib/MySQL_FTS.cpp`:
- Around line 483-493: The early-return path that handles (!indexes_result ||
indexes_result->rows.empty()) leaks memory when indexes_result is non-null;
modify that branch so that if indexes_result is non-null you free/delete/cleanup
it (e.g., delete indexes_result or call its provided free method) before
returning, and keep db->rdunlock() in the correct order (free indexes_result if
non-null, then rdunlock(), then set result fields and return). Ensure you
reference and handle the indexes_result variable from the execute_statement
call.
- Around line 329-334: The MySQL identifier usage in the selected_cols loop and
the final FROM clause is quoting names with backticks but not escaping embedded
backticks; add an escape helper (e.g., escape_mysql_identifier) mirroring
escape_identifier() behavior that duplicates any internal '`' and wraps the
result in surrounding backticks, then replace usages building mysql_query for
selected_cols, schema and table to call
escape_mysql_identifier(selected_cols[i]), escape_mysql_identifier(schema), and
escape_mysql_identifier(table) so all identifiers are safely escaped.
- Around line 92-109: sanitize_name can return an empty string when input has
only invalid characters; update MySQL_FTS::sanitize_name to guarantee a
non-empty, collision-resistant identifier by adding a deterministic fallback:
after sanitization and truncation (use MAX_NAME_LEN), if result is empty or
would collide, prepend or replace with a safe prefix (e.g., "_") and append a
short unique suffix derived from the original name (use std::hash<std::string>
of the original input truncated to a few hex chars) so symbols like
sanitize_name and MAX_NAME_LEN are used but the function always returns a valid,
bounded-length, and effectively unique name.
- Around line 326-338: The code in MySQL_FTS.cpp concatenates the user-provided
where_clause directly into the SQL built in the fts_index_table flow (see
mysql_query construction using selected_cols, schema, table and where_clause),
creating an SQL injection risk; fix it by changing the query construction to use
parameterized/prepared statements from your MySQL client API (build the SELECT
with column/backtick quoting as now, but replace dynamic predicates in
where_clause with named or positional placeholders and bind the corresponding
values), or if placeholders cannot represent arbitrary predicates, enforce a
strict whitelist/validation of allowed columns/operators and parse/escape values
before concatenation; locate the code that forms mysql_query and refactor to use
the client’s prepare/bind/execute API or add validation logic for where_clause
and document the expected safe format for callers.

In `@lib/MySQL_Tool_Handler.cpp`:
- Around line 1047-1058: The FTS methods (fts_index_table, fts_search,
fts_reindex, fts_rebuild_all, fts_delete_index) currently hold
pthread_mutex_lock(&fts_lock) across long-running I/O; refactor them to minimize
lock scope by copying the fts pointer under the lock (e.g., lock fts_lock, if
(!fts) return error, else MySQL_FTS* local_fts = fts; unlock fts_lock) and then
call local_fts->... (index_table/search/reindex/...) outside the mutex, or
alternatively add a clear comment above these functions that the mutex
intentionally serializes all FTS operations for correctness; ensure you
reference the same error behavior (returning JSON error when fts is null) and
preserve thread-safety during the pointer swap like reset_fts_path().

In `@scripts/mcp/test_mcp_fts_detailed.sh`:
- Around line 71-74: The mysql_exec function currently passes the password on
the command line (-p"${MYSQL_PASSWORD}") which exposes it; update mysql_exec to
stop using -p and instead invoke mysql with MYSQL_PWD set in the environment for
that invocation (e.g., set MYSQL_PWD="${MYSQL_PASSWORD}" only for the mysql
command) so the password does not appear in the process list; locate the
mysql_exec function and replace the argument-based password with an
environment-based MYSQL_PWD usage and ensure the variable is not exported beyond
the single invocation.
- Around line 251-260: The cleanup block guarded by CLEANUP calls
fts_delete_index twice and attempts to delete fts_test.orders again even though
it was already removed earlier; update the cleanup to avoid failing under set
-euo pipefail by either removing the redundant call that invokes tool_call
"fts_delete_index" for '{"schema":"fts_test","table":"orders"}' or make that
specific delete call tolerant of non-existent indexes (e.g., append a fallback
so the extract_tool_result/echo/jq pipeline does not cause a non-zero exit on
"not found" errors); ensure you modify the code that calls tool_call
"fts_delete_index" and the subsequent extract_tool_result/check logic to
implement the chosen approach.

In `@scripts/mcp/test_mcp_fts.sh`:
- Around line 205-240: The current fallback JSON parsing in extract_json_field
and check_json_bool is broken for jq-style dot-notation and complex paths (e.g.,
".success", ".results | length") and silently fails; update the script to
enforce jq presence at startup (fail fast with a clear error) OR rewrite the
fallback to canonicalize the field by stripping leading dots and support simple
dot-path traversal and array length checks used by extract_inner_result;
specifically modify the startup check to call command -v jq and exit with an
informative message if missing, and update extract_json_field/check_json_bool to
delegate to jq-only or to parse dot paths by splitting on '.' and traversing
nested objects/arrays so patterns like .success and .results|length return
correct values.
- Around line 509-514: The MCP tool responses are wrapped; replace uses of
extract_json_field with extract_inner_field when pulling numeric fields from the
wrapped fts_tool_call response (e.g., change total_matches, results | length,
row_count, rebuilt_count extractions). Locate occurrences where
extract_json_field is used alongside variables like total_matches, result_count,
row_count, rebuilt_count (and where check_json_bool already uses
extract_inner_result) and swap to extract_inner_field so the code unwraps the
MCP JSON-RPC result before evaluating numeric comparisons; keep the same
variable names and call sites (e.g., where response is parsed after
fts_tool_call and in routines that follow check_json_bool).

In `@test/tap/tests/Makefile`:
- Around line 171-173: The Makefile target anomaly_detection-t currently passes
the misspelled preprocessor define -DEXCLUDE_TRACKING_VARAIABLES; update that
flag to the correctly spelled -DEXCLUDE_TRACKING_VARIABLES in the
anomaly_detection-t rule so the code that checks for EXCLUDE_TRACKING_VARIABLES
is enabled as intended.

In `@test/tap/tests/vector_db_performance-t.cpp`:
- Around line 380-381: The second diagnostic uses the wrong/ inconsistent printf
format for total_duration.count(); change the format to match the 64-bit type
(use %lld) and ensure you cast total_duration.count() to (long long) in the diag
call so both diag("Average time per concurrent operation: %lld microseconds",
avg_time) and diag("Total time for %d operations: %lld microseconds",
num_operations, (long long)total_duration.count()) use the same 64-bit specifier
and types; locate these calls (diag with avg_time and diag with
total_duration.count()) and update the format/specifier and cast accordingly.
- Line 325: The diag call uses the wrong printf format specifier for
insert_duration.count(); update the format to safely match the underlying type
by either using "%lld" and casting insert_duration.count() to (long long) or by
casting to long and using "%ld" consistently; change the diag invocation that
prints "Time to insert %zu entries: %ld ms" to use a matching specifier and cast
(e.g., "%zu entries: %lld ms", (long long)insert_duration.count()) so the format
matches insert_duration.count() and aligns with the other timing output.
🧹 Nitpick comments (8)
doc/MCP/FTS_USER_GUIDE.md (1)

1-1: Consider hyphenating "Full-Text Search" for grammatical correctness.

"Full-Text Search" is a compound adjective modifying "User Guide" and should be hyphenated for consistency with standard technical writing conventions. This applies to both the title and the Overview section (line 19).

Suggested fix
-# MCP Full Text Search (FTS) - User Guide
+# MCP Full-Text Search (FTS) - User Guide

And similarly on line 19:

-The MCP Full Text Search (FTS) module provides fast, indexed search capabilities
+The MCP Full-Text Search (FTS) module provides fast, indexed search capabilities
scripts/mcp/test_mcp_fts.sh (2)

147-150: Add curl timeouts to avoid hanging test runs.

If the MCP endpoint is down, curl can block indefinitely. Add --connect-timeout/--max-time (optionally configurable via env).

⏱️ Suggested timeout defaults
-    response=$(curl -s -w "\n%{http_code}" -X POST "${MCP_ENDPOINT}" \
+    response=$(curl -s --connect-timeout 5 --max-time 30 -w "\n%{http_code}" -X POST "${MCP_ENDPOINT}" \
         -H "Content-Type: application/json" \
         -d "${payload}" 2>/dev/null)

352-355: Drop unused delete_response.

Shellcheck flags delete_response as unused. Either log failures or remove the variable.

🧹 Remove unused variable
-    local delete_response
-    delete_response=$(fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}")
+    fts_tool_call "fts_delete_index" "{\"schema\": \"${TEST_SCHEMA}\", \"table\": \"${TEST_TABLE}\"}" >/dev/null
lib/Query_Tool_Handler.cpp (1)

220-227: Tool schema marks columns as string, but runtime expects an array.

create_tool_schema types required params as strings, yet fts_index_table expects a JSON array (tests send an array). This mismatch can confuse schema-driven clients. Consider a custom schema for this tool where columns is an array of strings.

♻️ Example custom schema for fts_index_table
-    tools.push_back(create_tool_schema(
-        "fts_index_table",
-        "Create and populate a full-text search index for a MySQL table",
-        {"schema", "table", "columns", "primary_key"},
-        {{"where_clause", "string"}}
-    ));
+    {
+        json schema = {
+            {"type", "object"},
+            {"properties", {
+                {"schema", {{"type", "string"}, {"description", "schema parameter"}}},
+                {"table", {{"type", "string"}, {"description", "table parameter"}}},
+                {"columns", {{"type", "array"}, {"items", {{"type", "string"}}}, {"description", "columns to index"}}},
+                {"primary_key", {{"type", "string"}, {"description", "primary_key parameter"}}},
+                {"where_clause", {{"type", "string"}, {"description", "where_clause parameter"}}}
+            }},
+            {"required", {"schema", "table", "columns", "primary_key"}}
+        };
+        tools.push_back(create_tool_description(
+            "fts_index_table",
+            "Create and populate a full-text search index for a MySQL table",
+            schema
+        ));
+    }
include/MySQL_FTS.h (1)

90-96: Delete copy and move operations to prevent accidental resource duplication.

The class owns a raw SQLite3DB* pointer that is deleted in the destructor. Since copy/move operations are not explicitly deleted, they're implicitly defined, which means shallow copies could lead to double-free or use-after-free if the class is ever used by value or moved. Although the current codebase only uses pointer instances, it's defensive to explicitly prevent copying/moving.

🔒 Disable copy/move
  MySQL_FTS(const std::string& path);

+ MySQL_FTS(const MySQL_FTS&) = delete;
+ MySQL_FTS& operator=(const MySQL_FTS&) = delete;
+ MySQL_FTS(MySQL_FTS&&) = delete;
+ MySQL_FTS& operator=(MySQL_FTS&&) = delete;

  /**
   * `@brief` Destructor
   */
  ~MySQL_FTS();
lib/MySQL_Tool_Handler.cpp (2)

117-138: Redundant implementations of FTS path reset logic.

Both reset_fts_path() (lines 117-138) and reinit_fts() (lines 1142-1179) implement nearly identical functionality for swapping the FTS instance. The key differences are:

  • reinit_fts() additionally checks if the directory exists
  • reinit_fts() returns int while reset_fts_path() returns bool

Consider consolidating into a single implementation to avoid code duplication and potential divergence in behavior.

♻️ Suggested consolidation
 bool MySQL_Tool_Handler::reset_fts_path(const std::string& path) {
-	MySQL_FTS* new_fts = NULL;
-
-	// Initialize new FTS outside lock (blocking I/O)
-	if (!path.empty()) {
-		new_fts = new MySQL_FTS(path);
-		if (new_fts->init()) {
-			proxy_error("Failed to initialize FTS with new path: %s\n", path.c_str());
-			delete new_fts;
-			return false;
-		}
-	}
-
-	// Swap pointer under lock (non-blocking)
-	pthread_mutex_lock(&fts_lock);
-	MySQL_FTS* old_fts = fts;
-	fts = new_fts;
-	pthread_mutex_unlock(&fts_lock);
-	if (old_fts) delete old_fts;
-
-	return true;
+	return reinit_fts(path) == 0;
 }

Then update reinit_fts() to handle empty path case if needed, or keep reset_fts_path() as the primary implementation and have reinit_fts() call it after the directory check.

Also applies to: 1142-1179


303-329: Remove debug fprintf statements before merging.

Multiple debug fprintf(stderr, ...) statements remain throughout the file (e.g., lines 303, 309, 317, 319, 325, 329, 343-356, 401, 413, 419, 493-532, 537-566). The PR objectives mention "Code cleanup: removed seven debug fprintfs" but many still remain.

These should be removed or converted to proper proxy_debug() calls for production code.

Also applies to: 343-356, 401-419, 493-532, 537-566

lib/MySQL_FTS.cpp (1)

359-430: Database lock held during entire indexing operation.

The wrlock() acquired at line 359 is held until line 430, encompassing potentially thousands of row insertions. This could block read operations (search, list_indexes) for extended periods.

Consider committing in batches and releasing/reacquiring the lock between batches, or use a transaction with checkpoints.

Comment on lines 350 to 364
if (!strcmp(name, "fts_path")) {
if (variables.mcp_fts_path)
free(variables.mcp_fts_path);
variables.mcp_fts_path = strdup(value);
// Apply at runtime by resetting FTS in the existing handler
if (mysql_tool_handler) {
proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
if (!mysql_tool_handler->reset_fts_path(value)) {
proxy_error("Failed to reset FTS path at runtime\n");
return -1;
}
}

return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rollback mcp_fts_path on reset failure to keep config consistent.

set_variable("fts_path") updates the stored path before calling reset_fts_path. If reset fails, the function returns -1 but leaves the new path saved, so config no longer reflects the active FTS DB. Preserve the previous value and restore on failure.

🔧 Preserve previous value on failure
-    if (variables.mcp_fts_path)
-        free(variables.mcp_fts_path);
-    variables.mcp_fts_path = strdup(value);
-    // Apply at runtime by resetting FTS in the existing handler
-    if (mysql_tool_handler) {
-        proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
-        if (!mysql_tool_handler->reset_fts_path(value)) {
-            proxy_error("Failed to reset FTS path at runtime\n");
-            return -1;
-        }
-    }
-
-    return 0;
+    char* old = variables.mcp_fts_path;
+    variables.mcp_fts_path = strdup(value);
+    // Apply at runtime by resetting FTS in the existing handler
+    if (mysql_tool_handler) {
+        proxy_info("MCP: Applying new fts_path at runtime: %s\n", value);
+        if (!mysql_tool_handler->reset_fts_path(value)) {
+            proxy_error("Failed to reset FTS path at runtime\n");
+            free(variables.mcp_fts_path);
+            variables.mcp_fts_path = old;
+            return -1;
+        }
+    }
+    if (old) free(old);
+    return 0;
🤖 Prompt for AI Agents
In `@lib/MCP_Thread.cpp` around lines 350 - 364, The code assigns
variables.mcp_fts_path = strdup(value) before calling
mysql_tool_handler->reset_fts_path, so on reset failure the stored path is left
inconsistent; change the logic in the "fts_path" branch to preserve the previous
value: save the old pointer (old = variables.mcp_fts_path), create a new copy
(new_val = strdup(value)) but do not overwrite variables.mcp_fts_path until
reset succeeds; call mysql_tool_handler->reset_fts_path(value) and on success
assign variables.mcp_fts_path = new_val (free old if non-null), on failure free
new_val and leave variables.mcp_fts_path as old, log the error and return -1;
ensure no leaks or double-frees when old is null.

Comment on lines 92 to 109
std::string MySQL_FTS::sanitize_name(const std::string& name) {
const size_t MAX_NAME_LEN = 100;
std::string sanitized;
// Allowlist: only ASCII letters, digits, underscore
for (char c : name) {
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9') || c == '_') {
sanitized.push_back(c);
}
}
// Prevent leading digit (SQLite identifiers can't start with digit)
if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
sanitized.insert(sanitized.begin(), '_');
}
// Enforce maximum length
if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
return sanitized;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

sanitize_name() may return empty string causing issues.

If the input contains only invalid characters, sanitize_name() returns an empty string. This could lead to invalid table names like fts_data__ or trigger name collisions.

🔧 Suggested fix
 std::string MySQL_FTS::sanitize_name(const std::string& name) {
 	const size_t MAX_NAME_LEN = 100;
 	std::string sanitized;
 	// Allowlist: only ASCII letters, digits, underscore
 	for (char c : name) {
 		if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
 		    (c >= '0' && c <= '9') || c == '_') {
 			sanitized.push_back(c);
 		}
 	}
+	// Return fallback if empty
+	if (sanitized.empty()) {
+		sanitized = "_unnamed";
+	}
 	// Prevent leading digit (SQLite identifiers can't start with digit)
-	if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
+	if (sanitized[0] >= '0' && sanitized[0] <= '9') {
 		sanitized.insert(sanitized.begin(), '_');
 	}
 	// Enforce maximum length
 	if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
 	return sanitized;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::string MySQL_FTS::sanitize_name(const std::string& name) {
const size_t MAX_NAME_LEN = 100;
std::string sanitized;
// Allowlist: only ASCII letters, digits, underscore
for (char c : name) {
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9') || c == '_') {
sanitized.push_back(c);
}
}
// Prevent leading digit (SQLite identifiers can't start with digit)
if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
sanitized.insert(sanitized.begin(), '_');
}
// Enforce maximum length
if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
return sanitized;
}
std::string MySQL_FTS::sanitize_name(const std::string& name) {
const size_t MAX_NAME_LEN = 100;
std::string sanitized;
// Allowlist: only ASCII letters, digits, underscore
for (char c : name) {
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9') || c == '_') {
sanitized.push_back(c);
}
}
// Return fallback if empty
if (sanitized.empty()) {
sanitized = "_unnamed";
}
// Prevent leading digit (SQLite identifiers can't start with digit)
if (sanitized[0] >= '0' && sanitized[0] <= '9') {
sanitized.insert(sanitized.begin(), '_');
}
// Enforce maximum length
if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
return sanitized;
}
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 92 - 109, sanitize_name can return an empty
string when input has only invalid characters; update MySQL_FTS::sanitize_name
to guarantee a non-empty, collision-resistant identifier by adding a
deterministic fallback: after sanitization and truncation (use MAX_NAME_LEN), if
result is empty or would collide, prepend or replace with a safe prefix (e.g.,
"_") and append a short unique suffix derived from the original name (use
std::hash<std::string> of the original input truncated to a few hex chars) so
symbols like sanitize_name and MAX_NAME_LEN are used but the function always
returns a valid, bounded-length, and effectively unique name.

Comment on lines 326 to 338
// Build MySQL query to fetch data
std::ostringstream mysql_query;
mysql_query << "SELECT ";
for (size_t i = 0; i < selected_cols.size(); i++) {
if (i > 0) mysql_query << ", ";
mysql_query << "`" << selected_cols[i] << "`";
}

mysql_query << " FROM `" << schema << "`.`" << table << "`";

if (!where_clause.empty()) {
mysql_query << " WHERE " << where_clause;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection risk: User-provided where_clause directly interpolated into query.

The where_clause parameter is directly concatenated into the MySQL query without validation or escaping (line 337). While the PR objectives mention security hardening, this vector remains open.

An attacker could inject arbitrary SQL through the where_clause parameter passed to fts_index_table.

🔒 Suggested mitigation

Consider one of these approaches:

  1. Validate that where_clause contains only safe SQL predicates
  2. Use parameterized queries if the MySQL client library supports it
  3. At minimum, document the security implications and ensure this is only exposed to trusted callers
+	// WARNING: where_clause is passed directly to MySQL - ensure caller validates input
 	if (!where_clause.empty()) {
+		// Basic sanity check - reject obviously dangerous patterns
+		std::string upper_where = where_clause;
+		std::transform(upper_where.begin(), upper_where.end(), upper_where.begin(), ::toupper);
+		if (upper_where.find("INTO OUTFILE") != std::string::npos ||
+		    upper_where.find("LOAD_FILE") != std::string::npos) {
+			result["error"] = "Dangerous pattern in where_clause";
+			return result.dump();
+		}
 		mysql_query << " WHERE " << where_clause;
 	}
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 326 - 338, The code in MySQL_FTS.cpp
concatenates the user-provided where_clause directly into the SQL built in the
fts_index_table flow (see mysql_query construction using selected_cols, schema,
table and where_clause), creating an SQL injection risk; fix it by changing the
query construction to use parameterized/prepared statements from your MySQL
client API (build the SELECT with column/backtick quoting as now, but replace
dynamic predicates in where_clause with named or positional placeholders and
bind the corresponding values), or if placeholders cannot represent arbitrary
predicates, enforce a strict whitelist/validation of allowed columns/operators
and parse/escape values before concatenation; locate the code that forms
mysql_query and refactor to use the client’s prepare/bind/execute API or add
validation logic for where_clause and document the expected safe format for
callers.

Comment on lines 329 to 334
for (size_t i = 0; i < selected_cols.size(); i++) {
if (i > 0) mysql_query << ", ";
mysql_query << "`" << selected_cols[i] << "`";
}

mysql_query << " FROM `" << schema << "`.`" << table << "`";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Identifiers in MySQL query not properly escaped.

The column names from selected_cols and the schema/table names are wrapped in backticks but not escaped for embedded backticks. If a column name contains a backtick, this could cause SQL syntax errors or injection.

The code has escape_identifier() for SQLite but doesn't apply equivalent escaping for MySQL queries.

🔧 Suggested fix

Add a MySQL identifier escape helper or reuse the pattern from escape_identifier():

// Helper for MySQL identifiers (double backticks)
std::string escape_mysql_identifier(const std::string& id) {
    std::string escaped = "`";
    for (char c : id) {
        escaped.push_back(c);
        if (c == '`') escaped.push_back('`');
    }
    escaped.push_back('`');
    return escaped;
}

Then use it:

 	for (size_t i = 0; i < selected_cols.size(); i++) {
 		if (i > 0) mysql_query << ", ";
-		mysql_query << "`" << selected_cols[i] << "`";
+		mysql_query << escape_mysql_identifier(selected_cols[i]);
 	}
-	mysql_query << " FROM `" << schema << "`.`" << table << "`";
+	mysql_query << " FROM " << escape_mysql_identifier(schema) << "." << escape_mysql_identifier(table);
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 329 - 334, The MySQL identifier usage in the
selected_cols loop and the final FROM clause is quoting names with backticks but
not escaping embedded backticks; add an escape helper (e.g.,
escape_mysql_identifier) mirroring escape_identifier() behavior that duplicates
any internal '`' and wraps the result in surrounding backticks, then replace
usages building mysql_query for selected_cols, schema and table to call
escape_mysql_identifier(selected_cols[i]), escape_mysql_identifier(schema), and
escape_mysql_identifier(table) so all identifiers are safely escaped.

Comment on lines 483 to 493
db->rdlock();
indexes_result = db->execute_statement(indexes_sql.str().c_str(), &error, &cols, &affected);

if (!indexes_result || indexes_result->rows.empty()) {
db->rdunlock();
result["success"] = true;
result["query"] = query;
result["total_matches"] = 0;
result["results"] = json::array();
return result.dump();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Memory leak: indexes_result not freed before early return.

If indexes_result is non-null but has empty rows, the code returns early (line 492) without freeing indexes_result.

🐛 Proposed fix
 	if (!indexes_result || indexes_result->rows.empty()) {
 		db->rdunlock();
+		if (indexes_result) delete indexes_result;
 		result["success"] = true;
 		result["query"] = query;
 		result["total_matches"] = 0;
 		result["results"] = json::array();
 		return result.dump();
 	}
🤖 Prompt for AI Agents
In `@lib/MySQL_FTS.cpp` around lines 483 - 493, The early-return path that handles
(!indexes_result || indexes_result->rows.empty()) leaks memory when
indexes_result is non-null; modify that branch so that if indexes_result is
non-null you free/delete/cleanup it (e.g., delete indexes_result or call its
provided free method) before returning, and keep db->rdunlock() in the correct
order (free indexes_result if non-null, then rdunlock(), then set result fields
and return). Ensure you reference and handle the indexes_result variable from
the execute_statement call.

Comment on lines +205 to +240
# Extract JSON field value
extract_json_field() {
local response="$1"
local field="$2"

if command -v jq >/dev/null 2>&1; then
echo "${response}" | jq -r "${field}" 2>/dev/null || echo ""
else
# Fallback to grep/sed for basic JSON parsing
echo "${response}" | grep -o "\"${field}\"[[:space:]]*:[[:space:]]*\"[^\"]*\"" | sed 's/.*: "\(.*\)"/\1/' || echo ""
fi
}

# Check JSON boolean field
check_json_bool() {
local response="$1"
local field="$2"
local expected="$3"

# Extract inner result from double-nested structure
local inner_result
inner_result=$(extract_inner_result "${response}")

if command -v jq >/dev/null 2>&1; then
local actual
actual=$(echo "${inner_result}" | jq -r "${field}" 2>/dev/null)
[ "${actual}" = "${expected}" ]
else
# Fallback: check for true/false string
if [ "${expected}" = "true" ]; then
echo "${inner_result}" | grep -q "\"${field}\"[[:space:]]*:[[:space:]]*true"
else
echo "${inner_result}" | grep -q "\"${field}\"[[:space:]]*:[[:space:]]*false"
fi
fi
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the target lines and surrounding context
cat -n scripts/mcp/test_mcp_fts.sh | sed -n '200,250p'

Repository: ProxySQL/proxysql-vec

Length of output: 186


🏁 Script executed:

# Also check the second location mentioned
cat -n scripts/mcp/test_mcp_fts.sh | sed -n '1280,1300p'

Repository: ProxySQL/proxysql-vec

Length of output: 608


🏁 Script executed:

# Search for all calls to extract_json_field to see what field parameters are passed
rg "extract_json_field" scripts/mcp/test_mcp_fts.sh -A 2 -B 2

Repository: ProxySQL/proxysql-vec

Length of output: 2143


🏁 Script executed:

# Search for all calls to check_json_bool to see what field parameters are passed
rg "check_json_bool" scripts/mcp/test_mcp_fts.sh -A 2 -B 2

Repository: ProxySQL/proxysql-vec

Length of output: 3831


🏁 Script executed:

# Verify the actual implementation matches the snippet - read exact lines 205-240
cat -n scripts/mcp/test_mcp_fts.sh | sed -n '205,240p'

Repository: ProxySQL/proxysql-vec

Length of output: 1463


🏁 Script executed:

# Check if there's an extract_inner_result function to understand the full context
rg "extract_inner_result" scripts/mcp/test_mcp_fts.sh -A 5

Repository: ProxySQL/proxysql-vec

Length of output: 694


jq is de facto required but not enforced; the grep/sed fallback is non-functional and will silently fail tests.

The fallback JSON parsing in extract_json_field and check_json_bool cannot work with jq dot-notation fields (e.g., .success, .total_matches, .results | length). Line 214 and lines 235/237 search grep for literal "${field}", which becomes patterns like "\.success" that never match JSON keys like "success". Complex paths with pipes (.results | length, .indexes[] | select(...)) are unsupported entirely. Without jq, all extraction calls return empty strings silently, causing tests to fail or pass incorrectly. Add a pre-flight jq requirement check at lines 1284–1295, or redesign the fallback to strip leading dots and support basic path traversal.

🔧 Suggested pre-flight guard (make jq mandatory)
    # Pre-flight checks
    log_section "Pre-flight Checks"

+   if ! command -v jq >/dev/null 2>&1; then
+       log_error "jq is required for this test suite."
+       exit 1
+   fi
+
    if ! check_mcp_server; then
        log_error "MCP server check failed. Exiting."
        exit 1
    fi
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 205 - 240, The current fallback
JSON parsing in extract_json_field and check_json_bool is broken for jq-style
dot-notation and complex paths (e.g., ".success", ".results | length") and
silently fails; update the script to enforce jq presence at startup (fail fast
with a clear error) OR rewrite the fallback to canonicalize the field by
stripping leading dots and support simple dot-path traversal and array length
checks used by extract_inner_result; specifically modify the startup check to
call command -v jq and exit with an informative message if missing, and update
extract_json_field/check_json_bool to delegate to jq-only or to parse dot paths
by splitting on '.' and traversing nested objects/arrays so patterns like
.success and .results|length return correct values.

Comment on lines +509 to +514
# Check results
local total_matches
local result_count
total_matches=$(extract_json_field "${response}" ".total_matches")
result_count=$(extract_json_field "${response}" ".results | length")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 97


🏁 Script executed:

sed -n '505,520p' scripts/mcp/test_mcp_fts.sh
sed -n '535,550p' scripts/mcp/test_mcp_fts.sh
sed -n '565,575p' scripts/mcp/test_mcp_fts.sh
sed -n '590,605p' scripts/mcp/test_mcp_fts.sh
sed -n '690,705p' scripts/mcp/test_mcp_fts.sh
sed -n '773,785p' scripts/mcp/test_mcp_fts.sh
sed -n '850,865p' scripts/mcp/test_mcp_fts.sh
sed -n '930,945p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 3718


🏁 Script executed:

grep -n "^extract_json_field\|^extract_inner_field\|^extract_inner_result" scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 148


🏁 Script executed:

rg "function extract_json_field|function extract_inner_field|function extract_inner_result" scripts/mcp/ -A 5

Repository: ProxySQL/proxysql-vec

Length of output: 47


🏁 Script executed:

rg "extract_json_field|extract_inner_field|extract_inner_result" scripts/mcp/ -B 2 -A 2 | head -100

Repository: ProxySQL/proxysql-vec

Length of output: 5618


🏁 Script executed:

rg "^fts_tool_call\(|function fts_tool_call" scripts/mcp/ -A 10

Repository: ProxySQL/proxysql-vec

Length of output: 573


🏁 Script executed:

sed -n '500,520p' scripts/mcp/test_mcp_fts.sh
sed -n '530,555p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 1494


🏁 Script executed:

sed -n '156,195p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 942


🏁 Script executed:

rg "check_json_bool" scripts/mcp/test_mcp_fts.sh -B 2 -A 5 | head -80

Repository: ProxySQL/proxysql-vec

Length of output: 1886


🏁 Script executed:

sed -n '200,280p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 2369


🏁 Script executed:

sed -n '305,330p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 1070


🏁 Script executed:

sed -n '690,710p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 732


🏁 Script executed:

sed -n '850,870p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 663


🏁 Script executed:

sed -n '775,785p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 426


🏁 Script executed:

sed -n '930,945p' scripts/mcp/test_mcp_fts.sh

Repository: ProxySQL/proxysql-vec

Length of output: 551


🏁 Script executed:

# Let's also verify instances that correctly use extract_inner_field to see the pattern
rg "extract_inner_field.*total_matches|extract_inner_field.*row_count" scripts/mcp/test_mcp_fts.sh -B 3 -A 1

Repository: ProxySQL/proxysql-vec

Length of output: 588


Use extract_inner_field consistently for MCP tool result fields.

Tool responses from fts_tool_call are nested in the MCP JSON-RPC structure. While check_json_bool correctly unwraps them via extract_inner_result, the flagged lines extract other fields like .total_matches and .row_count using extract_json_field, which operates on the wrapped response and yields empty/null values. This causes numeric comparisons to fail silently.

Replace extract_json_field with extract_inner_field at these locations:

  • Lines 509-514: .total_matches, .results | length
  • Lines 541-543: .total_matches
  • Lines 568-570: .total_matches
  • Lines 596-598: .total_matches
  • Lines 695-698: .total_matches
  • Lines 778-779: .row_count
  • Lines 855-857: .rebuilt_count
  • Lines 935-936: .total_matches
🔧 Example fix (apply similarly to all flagged locations)
-    total_matches=$(extract_json_field "${response}" ".total_matches")
-    result_count=$(extract_json_field "${response}" ".results | length")
+    total_matches=$(extract_inner_field "${response}" ".total_matches")
+    result_count=$(extract_inner_field "${response}" ".results | length")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check results
local total_matches
local result_count
total_matches=$(extract_json_field "${response}" ".total_matches")
result_count=$(extract_json_field "${response}" ".results | length")
# Check results
local total_matches
local result_count
total_matches=$(extract_inner_field "${response}" ".total_matches")
result_count=$(extract_inner_field "${response}" ".results | length")
🤖 Prompt for AI Agents
In `@scripts/mcp/test_mcp_fts.sh` around lines 509 - 514, The MCP tool responses
are wrapped; replace uses of extract_json_field with extract_inner_field when
pulling numeric fields from the wrapped fts_tool_call response (e.g., change
total_matches, results | length, row_count, rebuilt_count extractions). Locate
occurrences where extract_json_field is used alongside variables like
total_matches, result_count, row_count, rebuilt_count (and where check_json_bool
already uses extract_inner_result) and swap to extract_inner_field so the code
unwraps the MCP JSON-RPC result before evaluating numeric comparisons; keep the
same variable names and call sites (e.g., where response is parsed after
fts_tool_call and in routines that follow check_json_bool).

Comment on lines 171 to 173
anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
$(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and check the specific lines
echo "=== Checking Makefile lines 171-173 ==="
sed -n '171,173p' test/tap/tests/Makefile

echo ""
echo "=== Checking Makefile lines 196 (and surrounding context) ==="
sed -n '194,198p' test/tap/tests/Makefile

echo ""
echo "=== Checking which spelling appears in the Makefile ==="
grep -n "EXCLUDE_TRACKING_VAR" test/tap/tests/Makefile

Repository: ProxySQL/proxysql-vec

Length of output: 1906


🏁 Script executed:

# Search the codebase for both spellings
echo "=== Checking for EXCLUDE_TRACKING_VARAIABLES (misspelled) ==="
rg "EXCLUDE_TRACKING_VARAIABLES" --type cpp --type h

echo ""
echo "=== Checking for EXCLUDE_TRACKING_VARIABLES (correct) ==="
rg "EXCLUDE_TRACKING_VARIABLES" --type cpp --type h

Repository: ProxySQL/proxysql-vec

Length of output: 372


Fix typo in preprocessor define: EXCLUDE_TRACKING_VARAIABLES should be EXCLUDE_TRACKING_VARIABLES.

Line 172 uses -DEXCLUDE_TRACKING_VARAIABLES (misspelled), while the codebase defines and checks for -DEXCLUDE_TRACKING_VARIABLES (correct spelling). This typo means the intended preprocessor guard won't work as the code never references the misspelled symbol.

Suggested fix
 anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
-	$(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@
+	$(CXX) -DEXCLUDE_TRACKING_VARIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
$(CXX) -DEXCLUDE_TRACKING_VARAIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@
anomaly_detection-t: anomaly_detection-t.cpp $(TAP_LDIR)/libtap.so
$(CXX) -DEXCLUDE_TRACKING_VARIABLES $< ../tap/SQLite3_Server.cpp -I$(CLICKHOUSE_CPP_IDIR) $(IDIRS) $(LDIRS) -L$(CLICKHOUSE_CPP_LDIR) -L$(LZ4_LDIR) $(OPT) $(OBJ) $(MYLIBSJEMALLOC) $(MYLIBS) $(STATIC_LIBS) $(CLICKHOUSE_CPP_LDIR)/libclickhouse-cpp-lib.a $(CLICKHOUSE_CPP_PATH)/contrib/zstd/zstd/libzstdstatic.a $(LZ4_LDIR)/liblz4.a $(SQLITE3_LDIR)/../libsqlite_rembed.a -lscram -lusual -Wl,--allow-multiple-definition -o $@
🤖 Prompt for AI Agents
In `@test/tap/tests/Makefile` around lines 171 - 173, The Makefile target
anomaly_detection-t currently passes the misspelled preprocessor define
-DEXCLUDE_TRACKING_VARAIABLES; update that flag to the correctly spelled
-DEXCLUDE_TRACKING_VARIABLES in the anomaly_detection-t rule so the code that
checks for EXCLUDE_TRACKING_VARIABLES is enabled as intended.


ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size);
diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count());
diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect format specifier may cause undefined behavior.

std::chrono::milliseconds::rep is typically int64_t/long long, not long. Using %ld is incorrect on LLP64 platforms (e.g., Windows 64-bit) where long is 32 bits. This also creates inconsistency with line 332 which still uses %lld for similar timing output.

Suggested fix: revert to %lld or cast explicitly
-	diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
+	diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));
🤖 Prompt for AI Agents
In `@test/tap/tests/vector_db_performance-t.cpp` at line 325, The diag call uses
the wrong printf format specifier for insert_duration.count(); update the format
to safely match the underlying type by either using "%lld" and casting
insert_duration.count() to (long long) or by casting to long and using "%ld"
consistently; change the diag invocation that prints "Time to insert %zu
entries: %ld ms" to use a matching specifier and cast (e.g., "%zu entries: %lld
ms", (long long)insert_duration.count()) so the format matches
insert_duration.count() and aligns with the other timing output.

Comment on lines 380 to 381
diag("Average time per concurrent operation: %lld microseconds", avg_time);
diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count());
diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent and incorrect format specifier.

Line 380 correctly uses %lld for avg_time (long long), but line 381 uses %ld for total_duration.count() which also returns a 64-bit integer type. This inconsistency and type mismatch should be corrected.

Suggested fix
 	diag("Average time per concurrent operation: %lld microseconds", avg_time);
-	diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());
+	diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count()));
🤖 Prompt for AI Agents
In `@test/tap/tests/vector_db_performance-t.cpp` around lines 380 - 381, The
second diagnostic uses the wrong/ inconsistent printf format for
total_duration.count(); change the format to match the 64-bit type (use %lld)
and ensure you cast total_duration.count() to (long long) in the diag call so
both diag("Average time per concurrent operation: %lld microseconds", avg_time)
and diag("Total time for %d operations: %lld microseconds", num_operations,
(long long)total_duration.count()) use the same 64-bit specifier and types;
locate these calls (diag with avg_time and diag with total_duration.count()) and
update the format/specifier and cast accordingly.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses security, memory safety, and thread safety issues identified in PR #21 code reviews for the Full-Text Search (FTS) functionality. It also includes additional cherry-picked fixes for test compilation issues.

Changes:

  • Security fixes: SQL injection prevention via identifier escaping and FTS5 query wrapping, allowlist-based name sanitization
  • Memory safety fixes: VLA replacement with std::vector, memory leak fixes for error strings
  • Thread safety fixes: mutex protection for FTS pointer swaps and runtime path changes
  • Test improvements: format specifier corrections, missing includes, build rule additions
  • New comprehensive test scripts for FTS functionality validation

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/tap/tests/vector_db_performance-t.cpp Fixed printf format specifiers for duration types, added missing includes
test/tap/tests/anomaly_detection-t.cpp Added stub function for SQLite3_Server_session_handler
test/tap/tests/ai_llm_retry_scenarios-t.cpp Added missing string header include
test/tap/tests/Makefile Added build rule for anomaly_detection-t test
scripts/mcp/test_mcp_fts_detailed.sh New detailed FTS validation script (has security issue)
scripts/mcp/test_mcp_fts.sh New comprehensive FTS test script
lib/MySQL_FTS.cpp New FTS implementation with security fixes
lib/MySQL_FTS.h New FTS class header
lib/MySQL_Tool_Handler.cpp FTS integration with mutex protection
lib/MySQL_Tool_Handler.h FTS method declarations
lib/Query_Tool_Handler.cpp FTS tool registration, debug statement removal
lib/ProxySQL_MCP_Server.cpp FTS path configuration parameter
lib/MCP_Thread.cpp FTS path variable handling
lib/MCP_Thread.h FTS path variable declaration
lib/Makefile Added MySQL_FTS.oo to build
include/MySQL_Tool_Handler.h FTS declarations and method signatures
include/MySQL_FTS.h Complete FTS class interface
include/MCP_Thread.h FTS path variable in struct
doc/MCP/FTS_USER_GUIDE.md Comprehensive FTS user guide

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


diag("Average time per concurrent operation: %lld microseconds", avg_time);
diag("Total time for %d operations: %lld microseconds", num_operations, total_duration.count());
diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The printf format specifier %lld is correct for long long, but std::chrono::duration::count() returns rep type which is implementation-defined. On some platforms it may be long rather than long long, causing format mismatch warnings. Consider using static_cast<long long>(total_duration.count()) for portability.

Suggested change
diag("Total time for %d operations: %ld microseconds", num_operations, total_duration.count());
diag("Total time for %d operations: %lld microseconds", num_operations, static_cast<long long>(total_duration.count()));

Copilot uses AI. Check for mistakes.

mysql_exec() {
local sql="$1"
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MySQL password is exposed in the command line arguments using -p"${MYSQL_PASSWORD}". This is a security vulnerability as the password can be seen in process listings by other users. The script should use the MYSQL_PWD environment variable instead, which is already done in other parts of the script. Change this line to: MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"

Suggested change
mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -p"${MYSQL_PASSWORD}" -e "${sql}"
MYSQL_PWD="${MYSQL_PASSWORD}" mysql -h "${MYSQL_HOST}" -P "${MYSQL_PORT}" -u "${MYSQL_USER}" -e "${sql}"

Copilot uses AI. Check for mistakes.

if (error || !resultset || resultset->rows.empty()) {
result["error"] = "Index not found for " + schema + "." + table;
if (resultset) delete resultset;
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error cleanup in reindex function. If an error is set from execute_statement (line 726), it should be freed before returning. Add if (error) (*proxy_sqlite3_free)(error); after line 730 and before returning the error result.

Suggested change
if (resultset) delete resultset;
if (resultset) delete resultset;
if (error) (*proxy_sqlite3_free)(error);

Copilot uses AI. Check for mistakes.
Comment on lines 92 to 109
std::string MySQL_FTS::sanitize_name(const std::string& name) {
const size_t MAX_NAME_LEN = 100;
std::string sanitized;
// Allowlist: only ASCII letters, digits, underscore
for (char c : name) {
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
(c >= '0' && c <= '9') || c == '_') {
sanitized.push_back(c);
}
}
// Prevent leading digit (SQLite identifiers can't start with digit)
if (!sanitized.empty() && sanitized[0] >= '0' && sanitized[0] <= '9') {
sanitized.insert(sanitized.begin(), '_');
}
// Enforce maximum length
if (sanitized.length() > MAX_NAME_LEN) sanitized = sanitized.substr(0, MAX_NAME_LEN);
return sanitized;
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sanitize_name function could return an empty string if the input contains no valid characters. An empty sanitized name could lead to issues when constructing table names. Consider adding validation to ensure the sanitized name is not empty, or prepend a default prefix like "fts_" when the result is empty.

Copilot uses AI. Check for mistakes.

ok(db.size() == large_size, "Large dataset (%zu entries) inserted successfully", large_size);
diag("Time to insert %zu entries: %lld ms", large_size, insert_duration.count());
diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The printf format specifier %lld is correct for long long, but std::chrono::duration::count() returns rep type which is implementation-defined. On some platforms it may be long rather than long long, causing format mismatch warnings. Consider using static_cast<long long>(insert_duration.count()) for portability.

Suggested change
diag("Time to insert %zu entries: %ld ms", large_size, insert_duration.count());
diag("Time to insert %zu entries: %lld ms", large_size, static_cast<long long>(insert_duration.count()));

Copilot uses AI. Check for mistakes.
This commit addresses all recommendations from CodeRabbit, Gemini Code Assist,
and Copilot for PR #25 (FTS security and code quality improvements).

Critical Security Fixes:
- MCP_Thread.cpp: Rollback fts_path on reset failure to keep config consistent
- MySQL_FTS.cpp: Add escape_mysql_identifier() for MySQL query identifier escaping
- MySQL_FTS.cpp: Add unique hash-based fallback to sanitize_name() for empty strings
- MySQL_FTS.cpp: Add where_clause validation to block dangerous SQL patterns

Memory Safety Fixes:
- MySQL_FTS.cpp: Fix indexes_result memory leak on early return in search()
- MySQL_FTS.h: Delete copy/move operations to prevent accidental resource duplication

Thread Safety Documentation:
- MySQL_Tool_Handler.cpp: Add comment explaining FTS lock design rationale

Test Script Improvements:
- test_mcp_fts.sh: Add curl timeouts (5s connect, 30s max)
- test_mcp_fts.sh: Remove unused delete_response variable
- test_mcp_fts_detailed.sh: Make cleanup tolerant of non-existent indexes

Build Fixes:
- Makefile: Fix EXCLUDE_TRACKING_VARAIABLES typo to EXCLUDE_TRACKING_VARIABLES
- vector_db_performance-t.cpp: Fix printf format specifiers to %lld with cast

Schema Fixes:
- Query_Tool_Handler.cpp: Change fts_index_table columns schema from string to array

Code Cleanup:
- MySQL_Tool_Handler.cpp: Remove all remaining debug fprintf statements (34 lines)
- Documentation: Change "Full Text" to "Full-Text" (hyphenated)

Total: ~50 fixes across 10 files
@renecannao renecannao merged commit f257c6a into v3.1-vec Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants