Summary
Add a new Layer 3 converter package html-to-md-swift to convert HTML documents into Markdown using macdoc's streaming architecture.
Why
docs/modular-architecture.md already identifies html-to-md-swift as a planned future converter.
- HTML → Markdown fits the existing
DocumentConverter + StreamingOutput protocol cleanly.
- It is the most direct next converter after
word-to-md-swift because the target format (markdown-swift) already exists.
Package / architecture
Layer 3 package
packages/html-to-md-swift
- Swift Package product:
HTMLToMDSwift
- Module:
HTMLToMDSwift
Dependencies
- Layer 2:
doc-converter-swift
- Layer 1 target writer:
markdown-swift
- HTML parsing:
SwiftSoup
This package should not import other converters.
DocumentConverter shape
Implement:
HTMLConverter: DocumentConverter
static let sourceFormat = "html"
convert(input:output:options:)
Implementation approach:
- parse HTML with SwiftSoup
- walk the DOM in document order
- project HTML semantics directly into Markdown-aware block / inline emission
- stream Markdown to
StreamingOutput instead of building a Markdown AST
Initial supported mappings
Block-level
- headings
h1...h6
- paragraphs
p
- unordered / ordered lists
ul / ol / li
- blockquote
- fenced code blocks from
pre > code
- horizontal rule
hr
- tables
table / tr / th / td
- line breaks
br
Inline
- strong / bold
- emphasis / italic
- strikethrough (
del, s)
- inline code (
code outside pre)
- links (
a[href])
- images (
img[src])
- raw text with entity decoding and whitespace normalization
CLI integration
Add a new macdoc subcommand group:
macdoc html input.html -o output.md
Optional follow-up subcommands can come later, but the first pass should mirror the current word UX.
Testing strategy
- package-level unit tests (80%+ coverage target)
- focused fixtures for headings, emphasis, links, lists, code, tables, blockquote, and nested structures
- end-to-end conversion from temporary
.html files to Markdown strings
- whitespace normalization regression tests to avoid noisy output
Notes
If this lands, md-to-html-swift should be promoted in the conversion matrix as the reverse-path follow-up.
Summary
Add a new Layer 3 converter package
html-to-md-swiftto convert HTML documents into Markdown using macdoc's streaming architecture.Why
docs/modular-architecture.mdalready identifieshtml-to-md-swiftas a planned future converter.DocumentConverter+StreamingOutputprotocol cleanly.word-to-md-swiftbecause the target format (markdown-swift) already exists.Package / architecture
Layer 3 package
packages/html-to-md-swiftHTMLToMDSwiftHTMLToMDSwiftDependencies
doc-converter-swiftmarkdown-swiftSwiftSoupThis package should not import other converters.
DocumentConvertershapeImplement:
HTMLConverter: DocumentConverterstatic let sourceFormat = "html"convert(input:output:options:)Implementation approach:
StreamingOutputinstead of building a Markdown ASTInitial supported mappings
Block-level
h1...h6pul/ol/lipre > codehrtable / tr / th / tdbrInline
del,s)codeoutsidepre)a[href])img[src])CLI integration
Add a new macdoc subcommand group:
macdoc html input.html -o output.mdOptional follow-up subcommands can come later, but the first pass should mirror the current
wordUX.Testing strategy
.htmlfiles to Markdown stringsNotes
If this lands,
md-to-html-swiftshould be promoted in the conversion matrix as the reverse-path follow-up.