Feature: PDF structure parser and metadata extraction

## Overview
Implement PDF parser to extract document structure, metadata, and text while skipping binary image data.

## Parent Epic
Part of #91 - Document & Office Format Awareness

## Description
Parse PDF structure (objects, streams, cross-reference tables) and extract meaningful strings from metadata, annotations, bookmarks, and text streams.

## Implementation Details
- Use `lopdf` or `pdf` crate
- Parse PDF object structure
- Extract document info dictionary (Title, Author, Subject, Keywords)
- Parse catalog and page tree
- Extract text from content streams
- Identify and skip image streams
- Parse annotations and form fields
- Extract JavaScript from actions

## String Sources
- Document metadata (Title, Author, Subject, Keywords, Creator, Producer)
- Bookmark titles
- Annotation text
- Form field names and values
- Font names
- JavaScript code
- Hyperlink URLs
- Named destinations

## Acceptance Criteria
- [ ] Parse PDF structure (v1.4-1.7)
- [ ] Extract all metadata dictionary entries
- [ ] Parse page content streams for text
- [ ] Skip binary image streams (entropy-based)
- [ ] Extract annotations and bookmarks
- [ ] Handle encrypted PDFs (metadata only)
- [ ] Tests with diverse PDF samples

## Test Cases
- Simple text PDFs
- PDFs with images
- PDFs with forms
- PDFs with JavaScript
- Encrypted PDFs
- Large PDFs (>100MB)

## Related
Project: #76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: PDF structure parser and metadata extraction #92

Overview

Parent Epic

Description

Implementation Details

String Sources

Acceptance Criteria

Test Cases

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature: PDF structure parser and metadata extraction #92

Description

Overview

Parent Epic

Description

Implementation Details

String Sources

Acceptance Criteria

Test Cases

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions