fix(chat): improve security, concurrency, and context handling for RAG chat/LLM by marevol · Pull Request #3060 · codelibs/fess

marevol · 2026-03-07T09:29:55Z

Summary

Add security hardening: session ownership verification, prompt injection prevention with delimiter markers
Add semaphore-based concurrency control for LLM requests with configurable limits
Make context size prompt-type aware for per-type defaults
Preserve document order by search score when fetching full content
Add new RAG configuration keys for history, highlights, and context tuning
Refactor LlmClientManager to consolidate availability checks into getAvailableClient()
Fix session integrity issues under concurrent access in ChatClient

Changes Made

Security

ChatSessionManager.clearSession(sessionId, userId): New overload validates userId ownership before clearing; rejects mismatched user IDs with 404
ChatApiManager: Session clear now passes userId for ownership verification
AbstractLlmClient.buildIntentDetectionSystemPrompt(): Added anti-injection instruction appended to system prompt
Evaluation prompt: User message and search query wrapped in delimiter markers (--- USER QUERY START --- etc.) to prevent injection
generateDocumentNotFoundResponse: Document URL wrapped as [URL: ...] to prevent injection
escapeQueryValue (renamed from escapeLuceneValue): Skip NULL characters in query value escaping

Context & Relevance

getContextMaxChars(): Now prompt-type aware, allowing per-type default context sizes
Preserve search score order when fetching full document content for context building
Tighten relevance evaluator system prompt for stricter filtering

Concurrency Control

AbstractLlmClient: Added Semaphore-based concurrency limiter initialized in init()
chatWithConcurrencyControl() / streamChatWithConcurrencyControl(): All LLM calls now go through concurrency limiting
Configurable via rag.llm.{provider}.max.concurrent.requests (default: 5) and rag.llm.{provider}.concurrency.wait.timeout (default: 30000ms)
Returns LlmException(ERROR_RATE_LIMIT) when too many concurrent requests

Reliability

ChatClient: User message added to session immediately (before LLM call) for session integrity under concurrency; trimHistory() moved to finally block
AbstractLlmClient.addHistoryWithBudget(): Truncation fallback when newest history message exceeds budget
buildStreamingRequest(): Simplified history budget — uses getHistoryMaxChars() directly instead of complex fixed-size calculation
All history-adding calls now use addHistoryWithBudget() with getHistoryMaxChars()

Refactoring

LlmClientManager.getAvailableClient(): Consolidated duplicate !available() checks across all 10+ delegation methods
Javadoc updated from "Lucene query" to "Fess query" in IntentDetectionResult, ChatClient

Configuration

Added to fess_config.properties:

rag.chat.highlight.fragment.size=500
rag.chat.highlight.number.of.fragments=3
rag.chat.history.assistant.content=source_titles
rag.chat.history.assistant.max.chars=500
rag.chat.history.assistant.summary.max.chars=500
rag.chat.history.max.chars=2000

Plus config keys for intent history max messages, message max length, and per-provider concurrency settings.

Testing

ChatClientTest: Updated to reflect new session message ordering (user message added before LLM call)
AbstractLlmClientTest: Updated for method rename
IntentDetectionResultTest: Updated accordingly

Breaking Changes

ChatSessionManager.clearSession(String sessionId) (no-arg userId) still exists for admin/internal use, but the API-facing path now requires userId ownership match
escapeLuceneValue renamed to escapeQueryValue in ChatClient (internal method)

🤖 Generated with Claude Code

…t/LLM - Add session ownership verification in ChatSessionManager.clearSession() to prevent cross-user session clearing; validate userId on session reuse - Add semaphore-based concurrency control in AbstractLlmClient to limit concurrent LLM requests (configurable via rag.llm.*.max.concurrent.requests) - Prevent prompt injection in intent detection system prompt and evaluation prompts by adding delimiter markers around user/search inputs - Wrap document URLs in [URL: ...] format to prevent injection in generateDocumentNotFoundResponse - Refactor LlmClientManager to use getAvailableClient() consolidating duplicate availability checks across all delegation methods - Move user message addition and trimHistory to finally block in ChatClient for session integrity under concurrent access and error conditions - Simplify history budget calculation to use getHistoryMaxChars() directly - Add truncation fallback when newest history message exceeds budget - Rename escapeLuceneValue to escapeQueryValue; skip NULL characters in escaping - Add new config properties for history max chars and highlight settings - Update Javadoc terminology from "Lucene query" to "Fess query" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n, and RAG config keys - Make getContextMaxChars prompt-type aware to allow per-type defaults - Preserve search score order when fetching full document content - Tighten relevance evaluator system prompt for stricter filtering - Add config keys: intent history max messages, message max length, highlight fragment size/count, history assistant content/chars, and history max chars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove the initial "session" SSE event from ChatApiManager and its client-side handler in chat.js (sessionId is already set before SSE starts) - Extract private findSession() to separate lookup from touch, preventing unintended session lifetime extension during internal lookups - Add explicit session.touch() calls in getOrCreateSession and clearSession for correct session lifetime tracking - Pass userId to clearSession in ChatAction for proper ownership validation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

marevol self-assigned this Mar 7, 2026

marevol added the improvement label Mar 7, 2026

marevol added this to the 15.6.0 milestone Mar 7, 2026

marevol changed the title ~~fix(chat): add security hardening and concurrency control for RAG chat/LLM~~ fix(chat): improve security, concurrency, and context handling for RAG chat/LLM Mar 7, 2026

marevol merged commit e4f2d67 into master Mar 7, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chat): improve security, concurrency, and context handling for RAG chat/LLM#3060

fix(chat): improve security, concurrency, and context handling for RAG chat/LLM#3060
marevol merged 3 commits intomasterfrom
fix/chat-llm-security-concurrency-improvements

marevol commented Mar 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marevol commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Security

Context & Relevance

Concurrency Control

Reliability

Refactoring

Configuration

Testing

Breaking Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marevol commented Mar 7, 2026 •

edited

Loading