Skip to content

fix(chat): improve security, concurrency, and context handling for RAG chat/LLM#3060

Merged
marevol merged 3 commits intomasterfrom
fix/chat-llm-security-concurrency-improvements
Mar 7, 2026
Merged

fix(chat): improve security, concurrency, and context handling for RAG chat/LLM#3060
marevol merged 3 commits intomasterfrom
fix/chat-llm-security-concurrency-improvements

Conversation

@marevol
Copy link
Contributor

@marevol marevol commented Mar 7, 2026

Summary

  • Add security hardening: session ownership verification, prompt injection prevention with delimiter markers
  • Add semaphore-based concurrency control for LLM requests with configurable limits
  • Make context size prompt-type aware for per-type defaults
  • Preserve document order by search score when fetching full content
  • Add new RAG configuration keys for history, highlights, and context tuning
  • Refactor LlmClientManager to consolidate availability checks into getAvailableClient()
  • Fix session integrity issues under concurrent access in ChatClient

Changes Made

Security

  • ChatSessionManager.clearSession(sessionId, userId): New overload validates userId ownership before clearing; rejects mismatched user IDs with 404
  • ChatApiManager: Session clear now passes userId for ownership verification
  • AbstractLlmClient.buildIntentDetectionSystemPrompt(): Added anti-injection instruction appended to system prompt
  • Evaluation prompt: User message and search query wrapped in delimiter markers (--- USER QUERY START --- etc.) to prevent injection
  • generateDocumentNotFoundResponse: Document URL wrapped as [URL: ...] to prevent injection
  • escapeQueryValue (renamed from escapeLuceneValue): Skip NULL characters in query value escaping

Context & Relevance

  • getContextMaxChars(): Now prompt-type aware, allowing per-type default context sizes
  • Preserve search score order when fetching full document content for context building
  • Tighten relevance evaluator system prompt for stricter filtering

Concurrency Control

  • AbstractLlmClient: Added Semaphore-based concurrency limiter initialized in init()
  • chatWithConcurrencyControl() / streamChatWithConcurrencyControl(): All LLM calls now go through concurrency limiting
  • Configurable via rag.llm.{provider}.max.concurrent.requests (default: 5) and rag.llm.{provider}.concurrency.wait.timeout (default: 30000ms)
  • Returns LlmException(ERROR_RATE_LIMIT) when too many concurrent requests

Reliability

  • ChatClient: User message added to session immediately (before LLM call) for session integrity under concurrency; trimHistory() moved to finally block
  • AbstractLlmClient.addHistoryWithBudget(): Truncation fallback when newest history message exceeds budget
  • buildStreamingRequest(): Simplified history budget — uses getHistoryMaxChars() directly instead of complex fixed-size calculation
  • All history-adding calls now use addHistoryWithBudget() with getHistoryMaxChars()

Refactoring

  • LlmClientManager.getAvailableClient(): Consolidated duplicate !available() checks across all 10+ delegation methods
  • Javadoc updated from "Lucene query" to "Fess query" in IntentDetectionResult, ChatClient

Configuration

Added to fess_config.properties:

rag.chat.highlight.fragment.size=500
rag.chat.highlight.number.of.fragments=3
rag.chat.history.assistant.content=source_titles
rag.chat.history.assistant.max.chars=500
rag.chat.history.assistant.summary.max.chars=500
rag.chat.history.max.chars=2000

Plus config keys for intent history max messages, message max length, and per-provider concurrency settings.

Testing

  • ChatClientTest: Updated to reflect new session message ordering (user message added before LLM call)
  • AbstractLlmClientTest: Updated for method rename
  • IntentDetectionResultTest: Updated accordingly

Breaking Changes

  • ChatSessionManager.clearSession(String sessionId) (no-arg userId) still exists for admin/internal use, but the API-facing path now requires userId ownership match
  • escapeLuceneValue renamed to escapeQueryValue in ChatClient (internal method)

🤖 Generated with Claude Code

…t/LLM

- Add session ownership verification in ChatSessionManager.clearSession() to
  prevent cross-user session clearing; validate userId on session reuse
- Add semaphore-based concurrency control in AbstractLlmClient to limit
  concurrent LLM requests (configurable via rag.llm.*.max.concurrent.requests)
- Prevent prompt injection in intent detection system prompt and evaluation
  prompts by adding delimiter markers around user/search inputs
- Wrap document URLs in [URL: ...] format to prevent injection in
  generateDocumentNotFoundResponse
- Refactor LlmClientManager to use getAvailableClient() consolidating
  duplicate availability checks across all delegation methods
- Move user message addition and trimHistory to finally block in ChatClient
  for session integrity under concurrent access and error conditions
- Simplify history budget calculation to use getHistoryMaxChars() directly
- Add truncation fallback when newest history message exceeds budget
- Rename escapeLuceneValue to escapeQueryValue; skip NULL characters in escaping
- Add new config properties for history max chars and highlight settings
- Update Javadoc terminology from "Lucene query" to "Fess query"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@marevol marevol self-assigned this Mar 7, 2026
@marevol marevol added this to the 15.6.0 milestone Mar 7, 2026
…n, and RAG config keys

- Make getContextMaxChars prompt-type aware to allow per-type defaults
- Preserve search score order when fetching full document content
- Tighten relevance evaluator system prompt for stricter filtering
- Add config keys: intent history max messages, message max length,
  highlight fragment size/count, history assistant content/chars,
  and history max chars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@marevol marevol changed the title fix(chat): add security hardening and concurrency control for RAG chat/LLM fix(chat): improve security, concurrency, and context handling for RAG chat/LLM Mar 7, 2026
- Remove the initial "session" SSE event from ChatApiManager and its
  client-side handler in chat.js (sessionId is already set before SSE starts)
- Extract private findSession() to separate lookup from touch, preventing
  unintended session lifetime extension during internal lookups
- Add explicit session.touch() calls in getOrCreateSession and clearSession
  for correct session lifetime tracking
- Pass userId to clearSession in ChatAction for proper ownership validation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@marevol marevol merged commit e4f2d67 into master Mar 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant