PDF-to-LaTeX: 圖片 scale 還原

## 問題

OCR 轉錄時，AI 回傳的 `FigureRegion.bbox` 包含圖片在頁面中的歸一化座標 `[x, y, width, height]`（0-1 範圍）。但產生的 LaTeX 使用 `\includegraphics{figures/xxx.png}` **沒有指定寬度**，導致圖片以原始像素大小顯示，遠大於原書排版。

## 現有資料

已有所需的所有 metadata（不需修改 OCR pipeline）：

| 資料 | 來源 | 範例 |
|------|------|------|
| `FigureRegion.bbox` | AI response (responses/*.json) | `[0.12, 0.08, 0.68, 0.31]` |
| `PageRecord.width` | manifest.json | `612` (points) |
| `PageRecord.height` | manifest.json | `792` (points) |

圖片原始寬度 = `bbox[2] × pageWidth` = 0.68 × 612 = 416pt ≈ 5.78in

## 實作方案

### Phase 1: Normalizer 後處理（不改 pipeline）

在 `LaTeXNormalizer.normalizeProject()` 中：
1. 讀取 `manifest.json` 和 `responses/*.json`
2. 對每個 `\includegraphics{figures/pNNN-figYY.png}`：
   - 查找對應的 FigureRegion.bbox
   - 計算 `width = bbox[2] × pageWidth / 72` (inches)
   - 或用 `\textwidth` 比例: `width = bbox[2]\textwidth`
   - 替換為 `\includegraphics[width=0.68\textwidth]{figures/pNNN-figYY.png}`
3. 冪等：已有 `[width=...]` 的不重複處理

### Phase 2: Pipeline 上游（未來）

修改 `PageTranscriber` 在生成 per-page tex 時就帶上正確的 width：
- 在 `postProcess()` 中，crop figure 後計算寬度
- 寫入 tex/page-NNNN.tex 時直接包含 `[width=...]`

## 影響範圍

Phase 1:
- `packages/pdf-to-latex-swift/Sources/PDFToLaTeXCore/LaTeXNormalizer.swift`
- `packages/pdf-to-latex-swift/Tests/PDFToLaTeXCoreTests/LaTeXNormalizerTests.swift`

## 測試驗證

- Hansen 教科書 accumulated.tex 中所有 31 個 \includegraphics 帶有 width 參數
- 編譯後圖片大小接近原書比例
- 冪等性測試

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF-to-LaTeX: 圖片 scale 還原 #10

問題

現有資料

實作方案

Phase 1: Normalizer 後處理（不改 pipeline）

Phase 2: Pipeline 上游（未來）

影響範圍

測試驗證

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

資料	來源	範例
`FigureRegion.bbox`	AI response (responses/*.json)	`[0.12, 0.08, 0.68, 0.31]`
`PageRecord.width`	manifest.json	`612` (points)
`PageRecord.height`	manifest.json	`792` (points)

PDF-to-LaTeX: 圖片 scale 還原 #10

Description

問題

現有資料

實作方案

Phase 1: Normalizer 後處理（不改 pipeline）

Phase 2: Pipeline 上游（未來）

影響範圍

測試驗證

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions