[hardening] PdfParser font-size heading heuristic uses page-wide max instead of segment-local size

## Bug

`PdfParser.ClassifySegment` font-size heuristic uses the largest font size *anywhere on the page* rather than the font size of the actual segment:

**Location:** `src/LambdaRag.Parsing/PdfParser.cs` ~line 127
```csharp
double maxSize = lineFontSizes.Values.Max();
if (maxSize >= medianFontSize * 1.2 && text.Length <= 200)
    return ContentBlockKind.Heading;
```

**Impact:** On any page that contains a single large title, every paragraph ≤200 chars on that page is classified as a heading. This produces phantom heading paths and corrupts `HeadingPath` for downstream selectors / projections.

**Expected:** Map each parsed segment back to the y-range of the letters that composed it (or, equivalently, build paragraph boundaries from `Letters` rather than from `page.Text`), and use that segment's own max font size for the comparison.

## Acceptance criteria
- [ ] Segment-level font-size measurement implemented.
- [ ] Unit test with a synthetic two-section PDF (one large title, then several normal paragraphs) verifying only the title is classified as a heading.
- [ ] Existing PDF parser tests still pass.
- [ ] Engine genericity guard still passes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hardening] PdfParser font-size heading heuristic uses page-wide max instead of segment-local size #59

Bug

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[hardening] PdfParser font-size heading heuristic uses page-wide max instead of segment-local size #59

Description

Bug

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions