Skip to content

v0.2.0

Choose a tag to compare

@conjuncts conjuncts released this 04 Sep 04:51
· 105 commits to main since this release

Features:

  • Multiple headers; multi-index tables (6225043)
  • Spanning cells on both the top and left (bbbbd7c)
  • Captions for tables (ca18bcc)
  • "Margin" parameter allows text outside of table bbox to be included (ab81f22)
  • Return visualized images as PIL image; allow padding or margin around visualized (ab81f22)

Several tweaks to formatting algorithm that may result in different outputs compared to prior versions.

  • Automatically drop rows whose only non-null values is the "is_projecting_row" column
  • Fill in gaps between table rows, to reduce skipped text
  • Non-maxima suppression, as seen in inference.py (ab81f22)
    • "total overlap" metric has become less useful in favor of "rows removed by NMS"
  • Widen out the rows to same length
  • Several tweaks to conditions, parameters, heuristics
    • superscripts/subscripts now more likely to be merged to their parent rows

Many possibly breaking changes to config.

  • TableDetectorConfig.confidence_score_threshold has been renamed to TableDetectorConfig.detector_base_threshold
  • TableFormatter.deduplication_iob_threshold has been removed in favor of nms_iob_threshold
  • spanning_cell_minimum_width, corner_clip_outlier_threshold, and aggregate_spanning_cells have been removed
  • Tweaks to default settings may yield different results
  • no_timm is now the default, which fixes #1.
    • this might cause slightly different bboxes