Skip to content

Add text_blocks: paragraph & list grouping of OCR lines#387

Merged
JE-Chen merged 1 commit into
devfrom
feat/text-blocks-batch
Jun 23, 2026
Merged

Add text_blocks: paragraph & list grouping of OCR lines#387
JE-Chen merged 1 commit into
devfrom
feat/text-blocks-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 23, 2026

Copy link
Copy Markdown
Member

摘要

新增 group_paragraphs / detect_lists — 把 OCR 行分組成段落並偵測項目符號 / 編號清單text_regions.find_text_lines 把字形併成行,但沒有功能把那些行分組成段落或偵測清單;ocr/structure 止於平面列。

group_paragraphs 在垂直間距超過 line_gap_factor × 中位行高處開始新段落(標準留白分組啟發法);detect_lists 以前導標記(/-/*1./2)/a.)與左縮排辨識清單項目,回傳 {text, marker, indent, box}。純標準函式庫,作用於純行字典;重用 table_grid_fill 的框邊界讀取器。Qt-free。

五層

  • 核心:utils/text_blocks/group_paragraphsdetect_lists
  • Facade / Executor AC_group_paragraphs + AC_detect_lists / MCP ac_group_paragraphs + ac_detect_lists / Script Builder(OCR)。
  • 文件:v174 EN + Zh + toctree。更新日誌:root EN + zh-TW + zh-CN。

測試

test_text_blocks_batch.py — 大間距分段、單一段落、項目符號 / 序號偵測、縮排記錄、空輸入、wiring + facade。7 passed。ruff / bandit / radon / float-scan / Qt-free 全乾淨。

text_regions merges glyphs into lines but nothing grouped those lines into
paragraphs or detected lists; ocr/structure stops at flat rows. group_paragraphs
starts a new paragraph wherever the vertical gap exceeds line_gap_factor x the
median line height; detect_lists recognises bullet / ordinal items by their
leading marker and left indent.
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 34 complexity · 0 duplication

Metric Results
Complexity 34
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit 49e8731 into dev Jun 23, 2026
16 checks passed
@JE-Chen JE-Chen deleted the feat/text-blocks-batch branch June 23, 2026 21:57
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant