LiteDoc v2.1.1 Release Notes 🚀
🧬 Recursive Sub-Column DLA Engine
- Completely overhauled the Document Layout Analysis (DLA) segmenter.
- Added Recursive Sub-Column Detection: The engine now accurately detects nested gutters inside columns.
- Fixed: Severe text interleaving issues in multi-column scientific papers (e.g., JAMA publications). Floating sidebar quotes are now cleanly isolated from main body paragraphs, ensuring perfect top-to-bottom reading order.
📝 A Quick Note on PDF Quality & Development
I am continuously experimenting with creative ways to push the PDF extraction engine further.
However, please remember that this tool is not magic extraction quality inherently depends on the structure, formatting, and text encoding of the original PDF you upload.
To those offering constructive feedback, thank you! To the critics expecting perfection: please remember this is a free, local tool still in active development and it is constantly evolving.
☕ Support the project: If this tool saves you time (and LLM tokens), consider buying me a coffee!