Table parser#22
Merged
Merged
Conversation
…X models and path extraction.
…ms, including debug path visualization and algorithm tracking.
…lizations, and tracing.
…le parsing and error handling.
…postprocessing, and add new hard sample examples.
…refining bounding box probabilities with softmax, and applying NMS to detected rows and columns.
…t LFS for CI builds
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add end-to-end table structure recognition to Ferrules. Tables detected by the layout model are now parsed into structured
TableBlockobjects with rows, cells, column/row spans, and header detection — ready for HTML rendering.Architecture
The table parser uses a three-algorithm fallback strategy, selecting the best approach based on the table characteristics:
Lattice — Path-based parsing for tables with visible borders. Extracts horizontal/vertical lines from PDF path objects and uses their intersections to derive the grid structure. Handles column spans by detecting missing vertical boundaries.
Vision — ML-based parsing using Microsoft Table Transformer (DETR) for borderless or complex tables. The model runs as a FP16 ONNX model with support for CoreML, CUDA, TensorRT, and CPU execution providers.
Stream — Text-alignment fallback for tables where neither paths nor the vision model produce usable results. Clusters text lines by vertical proximity and infers column boundaries from horizontal gaps.
The parser automatically evaluates Lattice results and falls back to Vision when the table looks "suspicious" (too few cells, too few rows, or insufficient area coverage).
Key Changes
New Modules
ferrules-core/src/parse/table.rs(~1000 lines) — Core table parsing module containing:TableParser— Orchestrates the three parsing algorithmsTableTransformer— ONNX inference wrapper with image preprocessing (ImageNet normalization, dynamic scaling) and DETR postprocessing (softmax, cx/cy/w/h → bbox conversion, confidence thresholding)ParseTableQueue/ParseTableRequest— Async message-passing interface matching the existing layout queue patterntokio::sync::Semaphorepython/export_table_transformer.py— Script to export the HuggingFace model to ONNX with FP16 quantizationData Model
TableBlock,TableRow,TableCell,TableAlgorithmBlockType::Tablenow wraps aTableBlockinstead of being a unit variantElementType::Tablecarries anOption<TableBlock>for deferred populationPDFPathandSegmententities for representing extracted vector pathsCharSpanandLineare nowCloneto support shared ownership across parsing stagesIntegration
page.rs): Table elements are dispatched to the table queue concurrently viaJoinSet, results are joined back before block mergingdocument.rs):FerrulesParsernow owns aParseTableQueuealongside the existing layout and native queuesnative.rs): Extracts PDF path objects (lines, rects) beforepage.flatten()to avoid segfaults from stale pdfium pointershtml.rs): Full<table>rendering withcolspan/rowspanattributes, recursive cell content rendering, and header row support (<th>)draw.rs): Table structure visualization (rows, cells, text) and PDF path overlay for debuggingblocks.rs,merge.rs): Table blocks can now be merged; table images are saved as cropped PNGsCI / Build
.gitattributesconfigured for*.onnxLFS tracking.gitignoreupdated to allow the specific model file throughhalfcrate dependency for FP16 tensor supporthalffeature flag onortOther
TableTransformerModelErrorTesting
To get started with GitHub CLI, please run: gh auth login
Alternatively, populate the GH_TOKEN environment variable with a GitHub API authentication token.