Bug
PdfParser.ClassifySegment font-size heuristic uses the largest font size anywhere on the page rather than the font size of the actual segment:
Location: src/LambdaRag.Parsing/PdfParser.cs ~line 127
double maxSize = lineFontSizes.Values.Max();
if (maxSize >= medianFontSize * 1.2 && text.Length <= 200)
return ContentBlockKind.Heading;
Impact: On any page that contains a single large title, every paragraph ≤200 chars on that page is classified as a heading. This produces phantom heading paths and corrupts HeadingPath for downstream selectors / projections.
Expected: Map each parsed segment back to the y-range of the letters that composed it (or, equivalently, build paragraph boundaries from Letters rather than from page.Text), and use that segment's own max font size for the comparison.
Acceptance criteria
Bug
PdfParser.ClassifySegmentfont-size heuristic uses the largest font size anywhere on the page rather than the font size of the actual segment:Location:
src/LambdaRag.Parsing/PdfParser.cs~line 127Impact: On any page that contains a single large title, every paragraph ≤200 chars on that page is classified as a heading. This produces phantom heading paths and corrupts
HeadingPathfor downstream selectors / projections.Expected: Map each parsed segment back to the y-range of the letters that composed it (or, equivalently, build paragraph boundaries from
Lettersrather than frompage.Text), and use that segment's own max font size for the comparison.Acceptance criteria