DataFlow v1.0.10 更新日志(v1.0.9 → v1.0.10)
🛠️ DataFlow v1.0.10 — 稳定性增强与细节优化
v1.0.10 是一次以「稳定性、易用性与细节完善」为核心的迭代版本。
相比 1.0.9 的大规模工程升级,这一版本更加聚焦于:
- 修复关键 pipeline 与算子中的实际使用问题
- 增强 PDF / VQA / QA 相关链路的稳定性
- 持续优化 README 文档与开发体验
- 补齐工程细节(参数、依赖、异常处理等)
这是一次“让 DataFlow 更可靠、更好用”的版本更新。
🔑 主要更新
🧩 Pipeline 与多模态能力增强
- 实现 pdf2model pipeline(集成 VQA + KBC)
- 持续优化 pdf2vqa pipeline(bug 修复与功能增强)
- 修复 pdf2qa pipeline 中 Mineru 类错误与冗余 pipeline 问题
- 更新依赖,引入
flash-mineru以增强 PDF 解析能力
感谢 @Heinz217、@wongzhenhao、@ZhaoyangHan04。
🛠️ 核心模块修复与优化
- 修复 KBChunker 递归 chunker 初始化问题
- 修复
llm_output_parser中图片路径拼接导致 Markdown 渲染失败问题 - 修复 Text2QAGenerator 中
output_question_key拼写错误 - 在 PromptedFilter 与 Text2QAGenerator 中过滤空行
- 为 PandasOperator 添加 dummy
output_key以避免 pipeline 编译错误
感谢 @AirAgentSDE、@liangrenjuan、@memoryforget、@fatty-belly。
📦 工程体验与工具增强
- 新增
get_version()helper - 更新
manifest.in,确保 scaffold 包含完整文件 - 多项异常处理与工程细节优化
感谢 @haosenwang1018、@SunnyHaze。
📚 README 与文档改进
- 增强 operator 使用示例说明
- 新增 uv 安装指引
- 修复 README 中编号与跳转问题
- 多轮文档清晰度优化
👨💻 新贡献者
欢迎以下新贡献者加入 DataFlow 社区:
@AirAgentSDE · @liangrenjuan · @Heinz217 · @memoryforget
🔗 完整更新记录
DataFlow v1.0.10 Release Notes(v1.0.9 → v1.0.10)
🛠️ DataFlow v1.0.10 — Stability Improvements & Refinements
v1.0.10 focuses on stability, usability, and polishing details.
Following the large-scale engineering upgrades in v1.0.9, this release emphasizes:
- Fixing real-world issues in pipelines and operators
- Strengthening PDF / VQA / QA related workflows
- Improving documentation and developer experience
- Refining engineering details (params, dependencies, error handling)
This is a release that makes DataFlow more robust and production-ready.
🔑 Highlights
🧩 Pipeline & Multimodal Enhancements
- Introduced pdf2model pipeline with integrated VQA + KBC
- Improved pdf2vqa pipeline (bug fixes & feature updates)
- Fixed Mineru class issue and redundant pipeline in pdf2qa
- Updated dependency with
flash-minerufor faster PDF parsing
Thanks to @Heinz217, @wongzhenhao, @ZhaoyangHan04.
🛠️ Core Fixes & Improvements
- Fixed KBChunker recursive initialization issue
- Fixed Markdown rendering bug caused by image path duplication in
llm_output_parser - Corrected
output_question_keytypo in Text2QAGenerator - Filtered empty rows in PromptedFilter & Text2QAGenerator
- Added dummy
output_keyin PandasOperator to bypass compile errors
Thanks to @AirAgentSDE, @liangrenjuan, @memoryforget, @fatty-belly.
📦 Engineering & DX Improvements
- Added
get_version()helper - Updated
manifest.into include all files in scaffold - Various improvements in error handling and engineering details
Thanks to @haosenwang1018, @SunnyHaze.
📚 README Updates
- Enhanced operator usage examples
- Added uv installation guide
- Fixed numbering and anchor link issues in README
- General clarity improvements
👨💻 New Contributors
Welcome our new contributors:
@AirAgentSDE · @liangrenjuan · @Heinz217 · @memoryforget
🔗 Full Changelog
Raw Release Note
What's Changed
- fix KBChunker recursive chunker initialization error by @AirAgentSDE in #478
- remove extra pipeline & fix pdf2qa api pipeline wrong Mineru class. by @ZhaoyangHan04 in #479
- feat: add get_version() helper by @haosenwang1018 in #481
- [README] enhanced with operator usage examples and uv installation instruction by @SunnyHaze in #485
- fix: 修复 llm_output_parser 中图片相对路径拼接冗余导致 Markdown 无法渲染的 Bug by @liangrenjuan in #486
- [pipeline & req]: implement pdf2model with integrated VQA and KBC; update flash-mineru to req.txt by @Heinz217 in #487
- [README] revise readme for clarity by @SunnyHaze in #489
- fix(core_text): correct output_question_key typo in Text2QAGenerator by @memoryforget in #491
- [README] fix number issue and a jump link issue by @SunnyHaze in #492
- pdf2vqa: bug fixed && feature update by @wongzhenhao in #493
- Add a dummy output_key param in PandasOperator to bypass pipeline compile error by @fatty-belly in #494
- fix(core_text): filter out empty rows in PromptedFilter and Text2QAGenerator by @memoryforget in #495
- [pyproject] update manifest.in to include all file into scaffold by @SunnyHaze in #496
New Contributors
- @AirAgentSDE made their first contribution in #478
- @liangrenjuan made their first contribution in #486
- @Heinz217 made their first contribution in #487
- @memoryforget made their first contribution in #491
Full Changelog: v1.0.9...v1.0.10