[hardening] PdfParser silently produces empty output for scanned / image-only PDFs

## Enhancement

`PdfParser.ParseAsync` produces an empty `ParsedDocument` when given a scanned / image-only PDF (no text layer). Downstream the pipeline silently reports zero findings, which is indistinguishable from a clean document — a serious failure mode for real customer uploads.

## Acceptance criteria
- [ ] After parse, if total canonical text length is 0 AND the PDF has ≥1 page with non-zero image content, throw a typed `ScannedPdfNotSupportedException` (or equivalent) with a clear message.
- [ ] Alternatively (and tracked here as one acceptable resolution): emit the failure into `ParsedDocument.Metadata` with key `parser_warning` and add an integration check that surfaces it in CLI / API output.
- [ ] Decision documented in the implementation PR.
- [ ] Unit test with a small image-only PDF fixture.
- [ ] Engine genericity guard still passes.

## Out of scope
- OCR integration (would be its own feature, not hardening).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hardening] PdfParser silently produces empty output for scanned / image-only PDFs #60

Enhancement

Acceptance criteria

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[hardening] PdfParser silently produces empty output for scanned / image-only PDFs #60

Description

Enhancement

Acceptance criteria

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions