Enhancement
PdfParser.ParseAsync produces an empty ParsedDocument when given a scanned / image-only PDF (no text layer). Downstream the pipeline silently reports zero findings, which is indistinguishable from a clean document — a serious failure mode for real customer uploads.
Acceptance criteria
Out of scope
- OCR integration (would be its own feature, not hardening).
Enhancement
PdfParser.ParseAsyncproduces an emptyParsedDocumentwhen given a scanned / image-only PDF (no text layer). Downstream the pipeline silently reports zero findings, which is indistinguishable from a clean document — a serious failure mode for real customer uploads.Acceptance criteria
ScannedPdfNotSupportedException(or equivalent) with a clear message.ParsedDocument.Metadatawith keyparser_warningand add an integration check that surfaces it in CLI / API output.Out of scope