Skip to content

feat: add html-to-md-swift converter and CLI support#14

Closed
PsychQuantClaw wants to merge 1 commit into
PsychQuant:mainfrom
PsychQuantClaw:feat/html-to-md
Closed

feat: add html-to-md-swift converter and CLI support#14
PsychQuantClaw wants to merge 1 commit into
PsychQuant:mainfrom
PsychQuantClaw:feat/html-to-md

Conversation

@PsychQuantClaw
Copy link
Copy Markdown
Collaborator

Summary

  • add new Layer 3 package packages/html-to-md-swift
  • implement HTMLConverter: DocumentConverter using SwiftSoup + streaming Markdown emission
  • add macdoc html CLI subcommand
  • populate CONVERSIONS.md with a real matrix + priority queue and mark html-to-md-swift active

Closes #13.

What the converter supports

Block-level

  • headings h1...h6
  • paragraphs
  • unordered / ordered lists
  • blockquote
  • fenced code blocks from pre > code
  • horizontal rule
  • tables
  • common wrapper/container blocks (div, section, article, main, etc.)

Inline

  • bold / emphasis / strikethrough
  • inline code
  • links
  • images
  • <br> hard-break option
  • optional raw HTML preservation for <u>, <sup>, <sub>, <mark>

Testing

Validated the new package with:

cd packages/html-to-md-swift
swift build --target HTMLToMDSwift
swift run HTMLToMDSwiftSelfTest

Self-test coverage includes:

  • headings + paragraphs
  • inline formatting + links
  • nested lists
  • code fences + language detection
  • blockquotes
  • tables
  • images + horizontal rule
  • hard breaks
  • optional HTML extensions
  • frontmatter

Notes

  • Root-level swift build is still affected by the existing clean-clone path dependency problem tracked in build: move local path dependencies to remote packages for clean clone/build #12; this PR does not attempt to solve that separately.
  • Because the current toolchain on the dev machine lacks XCTest / Testing modules, the package uses a self-test executable for verification instead of swift test in this branch.

Copy link
Copy Markdown
Collaborator

@kiki830621 kiki830621 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

CRITICAL

  • packages/html-to-md-swift/Package.swift:23-28 — Self-test 宣告為 .executableTarget,不是 .testTargetswift test 完全跑不到這些測試。專案規定 80% test coverage(common-testing.md),其他所有 package 都用 .testTarget + XCTestCase。需要改成標準 test target。

HIGH

  • HTMLConverter.swift:17SwiftSoup.parse(html, input.deletingLastPathComponent().absoluteString)file:///... 當 base URL 傳入,導致 HTML 中的相對連結(如 href="about.html")被解析成 file:///Users/.../about.html 洩漏到 Markdown 輸出。應傳空字串 "" 以保留原始相對路徑。

  • HTMLConverter.swift — 只支援 UTF-8 encoding(String(contentsOf: input, encoding: .utf8))。實務上 HTML 常用 ISO-8859-1 / Windows-1252,會直接 throw I/O error。建議先嘗試 UTF-8,失敗後 fallback String(contentsOf: input, usedEncoding:)

  • HTMLConverter.swift:105-107<hr> 輸出 ---,與 --frontmatter 模式的 YAML 分隔符衝突。Markdown parser 會誤判。建議改用 * * *- - -(帶空格)。

  • Sources/MacDocCLI/MacDoc+HTML.swift:8 — 用 ParsableCommand 而非 AsyncParsableCommand,與 Word subcommand 模式不一致。

  • packages/html-to-md-swift/Package.swift:5 — deployment target 宣告 .macOS(.v14),其他所有 local packages 都是 .v13。應統一為 .v13

MEDIUM

  • <ol start="N">start attribute 被忽略,永遠從 1 開始編號。
  • Block-level <br> 不受 --hard-breaks flag 控制,與 inline path 行為不一致。
  • trimTrailingNewlinesremoveLast() in-place mutation,可改用 trimmingCharacters(in:)

@kiki830621
Copy link
Copy Markdown
Collaborator

Fixes applied and merged directly to main. See commit 60aec53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement html-to-md-swift converter

2 participants