Skip to content

fix(chat): simplify DANGEROUS_QUERY_PATTERN to only block *:* queries#3059

Merged
marevol merged 1 commit intomasterfrom
fix/simplify-dangerous-query-pattern
Mar 7, 2026
Merged

fix(chat): simplify DANGEROUS_QUERY_PATTERN to only block *:* queries#3059
marevol merged 1 commit intomasterfrom
fix/simplify-dangerous-query-pattern

Conversation

@marevol
Copy link
Contributor

@marevol marevol commented Mar 7, 2026

Summary

  • Reduced the overly broad DANGEROUS_QUERY_PATTERN that was blocking legitimate search queries
  • The previous pattern rejected queries containing keywords like _source, _all, painless, or groovy in natural-language context (e.g., "painless scripting tutorial")
  • Now only rejects the *:* wildcard-all pattern, which is the actual security concern for unrestricted document dumps

Changes Made

  • ChatClient.java: Simplified DANGEROUS_QUERY_PATTERN from a multi-clause regex to just \*:\*
  • ChatClientTest.java: Added comprehensive unit tests for:
    • escapeLuceneValue — null, empty, backslash, double-quote, mixed, URL inputs
    • searchWithQuery — null, blank, length limits, valid queries, dangerous pattern rejection, previously false-positive patterns now allowed
    • escapeHtml — null, empty, all special HTML characters, XSS script tag
  • Extended TestableChatClient inner class with searchDocuments override and helper methods for state tracking

Testing

  • New unit tests cover all edge cases for the affected methods
  • Verified that *:* is still rejected while legitimate queries containing script-related keywords pass through
  • Tests use wasSearchDocumentsCalled() to assert whether the actual search was invoked

Breaking Changes

  • None — this change is less restrictive (expands what queries are accepted), not more

Additional Notes

  • The original pattern was introduced in a security hardening PR but proved too aggressive for RAG chat use cases where users may ask natural-language questions about topics like "painless scripting" or "source configuration"
  • Only *:* poses a real risk as it dumps all documents regardless of query intent

🤖 Generated with Claude Code

Reduced the overly broad query rejection pattern that was blocking
legitimate searches containing keywords like `_source`, `_all`, or
`painless` in natural-language queries. Now only rejects the `*:*`
wildcard-all pattern which is the actual security concern.

Added comprehensive unit tests for escapeLuceneValue, searchWithQuery,
and escapeHtml to verify the correct validation behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@marevol marevol self-assigned this Mar 7, 2026
@marevol marevol added this to the 15.6.0 milestone Mar 7, 2026
@marevol marevol merged commit 374059d into master Mar 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant