📃 Rapid Doc

🚀 Work In Progress

整体功能还没开发完哈！欢迎加入一起搞

📝 简介

该项目主要针对文档类图像做内容提取，将文档类图像一比一输出到Word或者Txt中，便于进一步使用或处理。后续计划支持输入PDF/图像，输出对应json格式、Txt格式、Word格式和Markdown格式。

🛠️ 整体框架

以下为整体框架依赖包，均为RapidAI出品。

flowchart TD
    A[/文档图像/] --> B([文档方向分类 rapid_orientation]) --> C([版面分析 rapid_layout])
    C --> D([表格识别 rapid_table]) & E([公式识别 rapid_latex_ocr]) & F([文字识别 rapidocr_onnxruntime]) --> G([版面还原 rapid_layout_recover])
    G --> H[/结构化输出/]

📑 输入和输出

输入：文档类图像
输出：TXT或Word

💻 安装运行环境

pip install -r requirements.txt

🚀 运行Demo

git clone https://github.com/RapidAI/RapidDoc.git
cd RapidDoc
python demo.py

📈 结果示例

⚠️注意：之所以提取结果没有分段，是因为版面分析模型没有段落检测功能。现有开源的所有版面分析模型都没有段落检测功能，这个后续会考虑自己训练一个版面分析模型来优化这里。

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
rapid_doc		rapid_doc
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
test_pdf_extract.py		test_pdf_extract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📃 Rapid Doc

🚀 Work In Progress

📝 简介

🛠️ 整体框架

📑 输入和输出

💻 安装运行环境

🚀 运行Demo

📈 结果示例

⭐ Star History

About

Releases 1

Languages

License

RapidAI/RapidDoc

Folders and files

Latest commit

History

Repository files navigation

📃 Rapid Doc

🚀 Work In Progress

📝 简介

🛠️ 整体框架

📑 输入和输出

💻 安装运行环境

🚀 运行Demo

📈 结果示例

⭐ Star History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages