Add comprehensive document processing test suite

## Problem

The RAG document processing pipeline has significant test coverage gaps. The following scenarios are **not tested**:

### File Format Edge Cases
- [ ] Corrupted PDF files (malformed structure)
- [ ] Password-protected/encrypted PDFs
- [ ] PDFs with only images (no extractable text)
- [ ] Zero-byte files
- [ ] Files with only whitespace
- [ ] Binary files misidentified as text
- [ ] Files with mixed encodings
- [ ] Unicode edge cases (RTL text, emoji, control characters)

### Resource & Memory
- [ ] Large PDFs (>100MB) — OOM behavior
- [ ] Memory limit enforcement (max_indexed_files, max_total_chunks boundaries)
- [ ] LRU eviction under pressure
- [ ] Embedding API timeout (180s boundary)

### Concurrency
- [ ] Concurrent document indexing
- [ ] Document removal during active query
- [ ] Concurrent uploads of same file
- [ ] Multiple sessions querying same index

### Error Recovery
- [ ] VLM model unavailable during PDF processing
- [ ] Lemonade Server restart during indexing
- [ ] Corrupted cache file recovery
- [ ] Partial embedding failure (some batches succeed, some fail)

### Integration
- [ ] Symlink detection and rejection
- [ ] File disappearing after permission check
- [ ] Very small chunk_size (1-10 characters)
- [ ] chunk_overlap = 0 boundary

## Proposed Approach
1. Create test fixtures in \`tests/fixtures/documents/\`: corrupted.pdf, encrypted.pdf, empty.txt, whitespace.txt, binary.dat, unicode_edge.txt
2. Unit tests with mocked Lemonade in \`tests/unit/rag/\`
3. Integration tests with real server in \`tests/integration/test_rag_robustness.py\`
4. Concurrent tests using \`threading\` + assertions on final state
5. Memory tests using resource limits

## Files
- `tests/unit/rag/` (new directory)
- `tests/integration/test_rag_robustness.py` (new file)
- `tests/fixtures/documents/` (new directory with test fixtures)

## Acceptance Criteria
- [ ] All scenarios above have at least one test
- [ ] Tests pass with mocked dependencies (no Lemonade required)
- [ ] Tests run in CI (< 60 seconds total)
- [ ] Test fixtures committed and documented

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comprehensive document processing test suite #456

Problem

File Format Edge Cases

Resource & Memory

Concurrency

Error Recovery

Integration

Proposed Approach

Files

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add comprehensive document processing test suite #456

Description

Problem

File Format Edge Cases

Resource & Memory

Concurrency

Error Recovery

Integration

Proposed Approach

Files

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions