Problem
`src/gaia/rag/sdk.py:413-578` — no detection of encrypted/password-protected PDFs. Encrypted PDFs extract empty text but report success:
reader = PdfReader(pdf_path) # Succeeds even if encrypted
for i, page in enumerate(reader.pages):
pypdf_text = page.extract_text() # Returns empty string for encrypted pages
User sees the document as "indexed" with 0 chunks and gets no results on queries.
Impact
- Encrypted PDFs silently produce empty indices
- Corrupted PDFs crash with unhelpful exceptions
- User doesn't know their PDF wasn't actually readable
- Error message ("No text content found") is misleading for encrypted files
Proposed Fix
- Check `reader.is_encrypted` before processing — raise specific error
- Catch `PdfReadError` for corrupted PDFs with user-friendly message
- Return specific error codes: `encrypted`, `corrupted`, `empty`, `unsupported`
- Provide remediation guidance: "Remove PDF password using qpdf or pdftk"
- Handle partially readable PDFs (some pages encrypted, some not)
- Add `pdf_status` to metadata: `readable`, `partial`, `encrypted`, `corrupted`
Files
src/gaia/rag/sdk.py (lines 413-578, _extract_text_from_pdf)
Acceptance Criteria
Problem
`src/gaia/rag/sdk.py:413-578` — no detection of encrypted/password-protected PDFs. Encrypted PDFs extract empty text but report success:
User sees the document as "indexed" with 0 chunks and gets no results on queries.
Impact
Proposed Fix
Files
src/gaia/rag/sdk.py(lines 413-578, _extract_text_from_pdf)Acceptance Criteria