Fidelity release focused on text extraction and AcroForm rendering on dense
government forms (CERFA). The public API is additive — existing behaviour is
unchanged except where noted as a fix.
Added
FormFieldnow surfaces its text-formatting metadata. Each field exposes
comb(the/Ffcomb flag for fixed-pitch character cells),quadding(the
/Qjustification: 0 left, 1 centre, 2 right), and the default-appearance font
and size parsed from the field's/DAstring asdaFont/daSize. This lets
a host reproduce a field's intended layout (combed cells, alignment, font
metrics) without re-parsing the appearance stream.
Fixed
- Spurious in-word spaces during text extraction. A new gap-aware
runs_joinhelper now drives all four reconstruction paths (lines,
paragraphs, lists, tables): a word split across several font runs no longer
emits a phantom space at each run boundary (e.g.ENFANT Sis reassembled as
ENFANTS). Spacing is decided from the real inter-run gap, not the mere fact
that the text changed font. - Form-field appearances double-rendered behind the editable text. A
widget_appearancesflag makesrenderPageNoText/renderPageExcluding
omit the/APappearance streams of AcroForm widgets, so a filled field's
baked-in value no longer shows through underneath the live, editable overlay. - Borderless prose misdetected as a table. A
line_has_gutterguard now
requires a real inter-cell gutter before promoting a borderless block to a
table: a two-run-per-line prose notice is kept as prose, while genuine tables
(wide column gutters) are still recognised.