Skip to content

v1.0.0

Choose a tag to compare

@ayoub-ibm ayoub-ibm released this 23 Jan 21:40
· 132 commits to main since this release

Features

Expanded Input Support
  • Add support for text (.txt), Markdown (.md), URLs, and serialized DoclingDocument inputs (023841f)
  • Introduce Input Normalization stage for automatic type detection, validation, and routing
  • Pipeline now skips OCR/segmentation for text inputs and reuses pre-processed DoclingDocuments
CLI Enhancements
  • convert command now accepts new input formats (023841f)
  • Improved input validation, URL handling, and clearer error messages

Architecture

Input Normalization Layer
  • Pipeline expanded from 4 → 5 stages with a new first-stage normalization layer (023841f)
  • Modular detectors, validators, and handlers for each input type
  • Extraction stage updated to support pre-normalized and pre-processed inputs

Security

  • docling: Bump docling dependency to version 2.70.0 to address nested dependencies impacted CVE listed issues (023841f)

Documentation

Input Format Documentation
  • New Input Formats page with CLI vs API support matrix (023841f)
  • Added examples for URL, Markdown, and DoclingDocument inputs
Architecture Diagrams
  • Updated all pipeline and architecture flowcharts (023841f)
  • Added a new diagram for the Input Normalization process