Skip to content

Add heading_segment: heading vs body classification + document outline#390

Merged
JE-Chen merged 1 commit into
devfrom
feat/heading-segment-batch
Jun 23, 2026
Merged

Add heading_segment: heading vs body classification + document outline#390
JE-Chen merged 1 commit into
devfrom
feat/heading-segment-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 23, 2026

Copy link
Copy Markdown
Member

摘要

新增 classify_lines / outline — 以 box 高度區分標題與內文並建立文件大綱。框架中沒有功能把行高對應到標題層級或建立章節大綱(ocr/structure / element_parse 純屬位置性,text_blocks 不排序)。

本功能套用標準啟發法:行高超過 heading_ratio × 中位行高者為標題,不同的標題高度成為層級(最高 = 1)。classify_lines 為每行標記 {box, text, role, level};outline 依上到下順序回傳標題作為目錄。純標準函式庫,作用於純行字典;重用 table_grid_fill 的框邊界讀取器。Qt-free。

五層

  • 核心:utils/heading_segment/classify_linesoutline
  • Facade / Executor AC_classify_lines + AC_outline / MCP ac_classify_lines + ac_outline / Script Builder(OCR)。
  • 文件:v176 EN + Zh + toctree。更新日誌:root EN + zh-TW + zh-CN。

測試

test_heading_segment_batch.py — 標題/內文標記與層級、純內文無標題、outline 依序、空輸入、wiring + facade。6 passed。ruff / bandit / radon / float-scan / Qt-free 全乾淨。

Nothing mapped line height to heading levels or built a section outline;
ocr/structure and element_parse are positional and text_blocks doesn't rank.
Apply the standard heuristic: a line taller than heading_ratio x the median
line height is a heading, and distinct heading heights become levels (tallest =
1). classify_lines tags each line; outline returns the headings in order.
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 31 complexity · 0 duplication

Metric Results
Complexity 31
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit 88e4a85 into dev Jun 23, 2026
16 checks passed
@JE-Chen JE-Chen deleted the feat/heading-segment-batch branch June 23, 2026 22:36
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant