Skip to content

DataFlow v1.0.10 Release Note

Latest

Choose a tag to compare

@SunnyHaze SunnyHaze released this 26 Mar 13:55
· 11 commits to main since this release

DataFlow v1.0.10 更新日志(v1.0.9 → v1.0.10)

🛠️ DataFlow v1.0.10 — 稳定性增强与细节优化

v1.0.10 是一次以「稳定性、易用性与细节完善」为核心的迭代版本。

相比 1.0.9 的大规模工程升级,这一版本更加聚焦于:

  • 修复关键 pipeline 与算子中的实际使用问题
  • 增强 PDF / VQA / QA 相关链路的稳定性
  • 持续优化 README 文档与开发体验
  • 补齐工程细节(参数、依赖、异常处理等)

这是一次“让 DataFlow 更可靠、更好用”的版本更新。


🔑 主要更新

🧩 Pipeline 与多模态能力增强

  • 实现 pdf2model pipeline(集成 VQA + KBC)
  • 持续优化 pdf2vqa pipeline(bug 修复与功能增强)
  • 修复 pdf2qa pipeline 中 Mineru 类错误与冗余 pipeline 问题
  • 更新依赖,引入 flash-mineru 以增强 PDF 解析能力

感谢 @Heinz217@wongzhenhao@ZhaoyangHan04


🛠️ 核心模块修复与优化

  • 修复 KBChunker 递归 chunker 初始化问题
  • 修复 llm_output_parser 中图片路径拼接导致 Markdown 渲染失败问题
  • 修复 Text2QAGenerator 中 output_question_key 拼写错误
  • 在 PromptedFilter 与 Text2QAGenerator 中过滤空行
  • 为 PandasOperator 添加 dummy output_key 以避免 pipeline 编译错误

感谢 @AirAgentSDE@liangrenjuan@memoryforget@fatty-belly


📦 工程体验与工具增强

  • 新增 get_version() helper
  • 更新 manifest.in,确保 scaffold 包含完整文件
  • 多项异常处理与工程细节优化

感谢 @haosenwang1018@SunnyHaze


📚 README 与文档改进

  • 增强 operator 使用示例说明
  • 新增 uv 安装指引
  • 修复 README 中编号与跳转问题
  • 多轮文档清晰度优化

👨‍💻 新贡献者

欢迎以下新贡献者加入 DataFlow 社区:

@AirAgentSDE · @liangrenjuan · @Heinz217 · @memoryforget


🔗 完整更新记录

👉 v1.0.9...v1.0.10


DataFlow v1.0.10 Release Notes(v1.0.9 → v1.0.10)

🛠️ DataFlow v1.0.10 — Stability Improvements & Refinements

v1.0.10 focuses on stability, usability, and polishing details.

Following the large-scale engineering upgrades in v1.0.9, this release emphasizes:

  • Fixing real-world issues in pipelines and operators
  • Strengthening PDF / VQA / QA related workflows
  • Improving documentation and developer experience
  • Refining engineering details (params, dependencies, error handling)

This is a release that makes DataFlow more robust and production-ready.


🔑 Highlights

🧩 Pipeline & Multimodal Enhancements

  • Introduced pdf2model pipeline with integrated VQA + KBC
  • Improved pdf2vqa pipeline (bug fixes & feature updates)
  • Fixed Mineru class issue and redundant pipeline in pdf2qa
  • Updated dependency with flash-mineru for faster PDF parsing

Thanks to @Heinz217, @wongzhenhao, @ZhaoyangHan04.


🛠️ Core Fixes & Improvements

  • Fixed KBChunker recursive initialization issue
  • Fixed Markdown rendering bug caused by image path duplication in llm_output_parser
  • Corrected output_question_key typo in Text2QAGenerator
  • Filtered empty rows in PromptedFilter & Text2QAGenerator
  • Added dummy output_key in PandasOperator to bypass compile errors

Thanks to @AirAgentSDE, @liangrenjuan, @memoryforget, @fatty-belly.


📦 Engineering & DX Improvements

  • Added get_version() helper
  • Updated manifest.in to include all files in scaffold
  • Various improvements in error handling and engineering details

Thanks to @haosenwang1018, @SunnyHaze.


📚 README Updates

  • Enhanced operator usage examples
  • Added uv installation guide
  • Fixed numbering and anchor link issues in README
  • General clarity improvements

👨‍💻 New Contributors

Welcome our new contributors:

@AirAgentSDE · @liangrenjuan · @Heinz217 · @memoryforget


🔗 Full Changelog

👉 v1.0.9...v1.0.10

Raw Release Note

What's Changed

  • fix KBChunker recursive chunker initialization error by @AirAgentSDE in #478
  • remove extra pipeline & fix pdf2qa api pipeline wrong Mineru class. by @ZhaoyangHan04 in #479
  • feat: add get_version() helper by @haosenwang1018 in #481
  • [README] enhanced with operator usage examples and uv installation instruction by @SunnyHaze in #485
  • fix: 修复 llm_output_parser 中图片相对路径拼接冗余导致 Markdown 无法渲染的 Bug by @liangrenjuan in #486
  • [pipeline & req]: implement pdf2model with integrated VQA and KBC; update flash-mineru to req.txt by @Heinz217 in #487
  • [README] revise readme for clarity by @SunnyHaze in #489
  • fix(core_text): correct output_question_key typo in Text2QAGenerator by @memoryforget in #491
  • [README] fix number issue and a jump link issue by @SunnyHaze in #492
  • pdf2vqa: bug fixed && feature update by @wongzhenhao in #493
  • Add a dummy output_key param in PandasOperator to bypass pipeline compile error by @fatty-belly in #494
  • fix(core_text): filter out empty rows in PromptedFilter and Text2QAGenerator by @memoryforget in #495
  • [pyproject] update manifest.in to include all file into scaffold by @SunnyHaze in #496

New Contributors

Full Changelog: v1.0.9...v1.0.10