Skip to content

Add an option to produce fine-grained HTML renderers based on CommonMark's grammar#635

Merged
Witiko merged 41 commits intoWitiko:mainfrom
herley-shaori:feature/issue-606-parse-html-block-types
Apr 2, 2026
Merged

Add an option to produce fine-grained HTML renderers based on CommonMark's grammar#635
Witiko merged 41 commits intoWitiko:mainfrom
herley-shaori:feature/issue-606-parse-html-block-types

Conversation

@herley-shaori
Copy link
Copy Markdown

@herley-shaori herley-shaori commented Mar 11, 2026

Summary

Implements the first task of #606 by exposing CommonMark's HTML block type differentiation through individual renderers.

A new boolean option parseHtmlBlocks (default: false) is added. When enabled alongside the html option, the parser produces type-specific renderers instead of the generic inputBlockHtmlElement and inlineHtmlTag renderers:

Block HTML renderers (each receives a filename of a file containing the HTML block contents):

Renderer CommonMark type Matches
inputBlockHtmlCommentElement Type 2 <!-- ... -->
inputBlockHtmlInstructionElement Type 3 <? ... ?>
inputBlockHtmlDeclarationElement Type 4 <! ... >
inputBlockHtmlCdataElement Type 5 <![CDATA[ ... ]]>
inputBlockHtmlSpecialElement Type 1 <script>, <pre>, <style>, <textarea>
inputBlockHtmlRegularElement Type 6 <div>, <table>, <form>, etc.
inputBlockHtmlAnyElement Type 7 Any other complete tag on its own line

Inline HTML renderers (each receives the tag contents as a string):

Renderer Matches
inlineHtmlInstruction <? ... ?>
inlineHtmlDeclaration <! ... >
inlineHtmlCdataSection <![CDATA[ ... ]]>
inlineHtmlOpenTag <tag>
inlineHtmlCloseTag </tag>
inlineHtmlEmptyTag <tag/>

The existing inlineHtmlComment renderer remains unchanged. When parseHtmlBlocks is false (default), behavior is fully backward compatible.

Changes

  • markdown.dtx: Added parseHtmlBlocks option definition, 13 new renderer registrations, 13 new Lua writer functions, conditional parser routing in DisplayHtml and InlineHtml, and documentation for all new renderers.
  • tests/support/keyval-setup.tex: Added test renderer prototypes for all 13 new renderers.
  • tests/testfiles/regression/github/issue-606-block-html-types.test: New regression test verifying type-specific renderers are produced when parseHtmlBlocks is enabled.

Test plan

  • New regression test passes across all non-ConTeXt TeX formats (luatex, lualatex, pdflatex, pdftex)
  • All 44 existing CommonMark_0.30/html_blocks tests pass (backward compatibility)
  • Both existing CommonMark_0.31.2/raw_html tests pass (backward compatibility)
  • ConTeXt tests (skipped locally due to missing rename utility — CI should cover this)

Note

This PR addresses Task 1 of #606. Task 2 (renderers corresponding to HTML nodes) would require a more substantial HTML parser and is left for a follow-up.

Continues #606.

herley and others added 2 commits March 11, 2026 18:15
…TML types

Implement the first task of issue Witiko#606: expose CommonMark's HTML block
type differentiation through individual renderers.

When the new `parseHtmlBlocks` option is enabled (default: false), the
parser produces type-specific renderers instead of the generic
`inputBlockHtmlElement` and `inlineHtmlTag` renderers:

Block HTML renderers (by CommonMark type):
- inputBlockHtmlCommentElement (type 2: HTML comments)
- inputBlockHtmlInstructionElement (type 3: processing instructions)
- inputBlockHtmlDeclarationElement (type 4: declarations)
- inputBlockHtmlCdataElement (type 5: CDATA sections)
- inputBlockHtmlSpecialElement (type 1: script/pre/style/textarea)
- inputBlockHtmlRegularElement (type 6: div/table/form etc.)
- inputBlockHtmlAnyElement (type 7: any other complete tag)

Inline HTML renderers:
- inlineHtmlInstruction (processing instructions)
- inlineHtmlDeclaration (declarations)
- inlineHtmlCdataSection (CDATA sections)
- inlineHtmlOpenTag (opening tags)
- inlineHtmlCloseTag (closing tags)
- inlineHtmlEmptyTag (self-closing tags)

The existing inlineHtmlComment renderer remains unchanged.
When parseHtmlBlocks is false (default), behavior is fully backward
compatible.

Closes Witiko#606 (task 1)
@Witiko
Copy link
Copy Markdown
Owner

Witiko commented Mar 11, 2026

Hi @herley-shaori, this looks great, especially for a first-time contribution. Thanks for putting in the effort!

Below, I reviewed the code. If you'd like to make the necessary changes yourself, that would be great; otherwise, I'm happy to take over the PR from here.

Comment thread tests/testfiles/regression/github/issue-606-block-html-types.test Outdated
Comment thread .gitignore Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
@Witiko Witiko added lua Related to the Lua interface and implementation conversion output Related to the output format of the Markdown-to-TeX conversion labels Mar 11, 2026
…option

Apply all changes requested in the code review:

- Rename `parseHtmlBlocks` (boolean) option to `htmlOutput` (string) with
  values `basic` (default) and `commonmark`, making the design more
  future-proof for potential additional values like `nodes`.
- Simplify option documentation to not list individual renderers (consistent
  with other option descriptions in the codebase).
- Fix documentation markup: replace `\Mdef` with `\mref` for cross-references
  to renderers defined elsewhere.
- Add `[raw-html]` link reference for inline HTML construct types.
- Fix Lua code indentation in InlineHtml and DisplayHtml parser sections.
- Remove unnecessary `if: format != 'context'` condition from test file.
- Update .gitignore entry from `venv/` to `tests/test-virtualenv`.
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
@Witiko
Copy link
Copy Markdown
Owner

Witiko commented Mar 11, 2026

Tasks for myself in addition to the comments from the code review:

  • Set htmlOutput = "commonmark" when the experimental option has been enabled.
  • Improve the naming of writers, i.e. why block_html_comment_element when it's an HTML node (not element).
    • Same for renderers and renderer prototypes, why \markdownRendererInputBlockHtmlCommentElement rather than just \markdownRendererInputBlockHtmlComment?
  • Add a code example redefining \markdownRendererInputBlockHtmlComment to the user manual, similar to the existing code example for \markdownRendererInlineComment.
  • Update CHANGES.md.

@herley-shaori
Copy link
Copy Markdown
Author

Hi @Witiko, thanks so much for taking the time to review this! Your feedback is really valuable. I realize I still have a lot to learn — I'll step back and leave this PR to you. If there's anything I can do to help in the future, happy to contribute! 😊

Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
herley and others added 3 commits March 12, 2026 07:07
- Add \mref{markdownRendererInlineHtmlComment} to the `basic` option
  description for completeness.
- Add [html-blocks] and [raw-html] link references to the option
  documentation fragment (each \begin{markdown} fragment needs its own
  link references).
- Split renderer documentation into separate sections:
  - Rename "HTML Tag and Element Renderers" to "Basic HTML Tag and
    Element Renderers" for the generic renderers.
  - Add "CommonMark Block HTML Element Renderers" section for
    type-specific block renderers.
  - Add "CommonMark Inline HTML Renderers" section for type-specific
    inline renderers.
- Fix Lua code style: add space after Cs( in else branches of
  InlineHtml and DisplayHtml parsers for consistency.
Witiko
Witiko previously approved these changes Mar 17, 2026
@Witiko Witiko requested a review from lostenderman March 17, 2026 13:39
Comment thread tests/testfiles/unit/lunamark-markdown/html-output-commonmark.test Outdated
Move issue-606-block-html-types.test from regression/github/ to
unit/lunamark-markdown/html-output-commonmark.test as requested
in review feedback.
@Witiko Witiko changed the title Add parseHtmlBlocks option for type-specific HTML renderers Add htmlOutput option to produce fine-grained HTML renderers based on CommonMark's grammar Mar 18, 2026
@Witiko Witiko changed the title Add htmlOutput option to produce fine-grained HTML renderers based on CommonMark's grammar Add an option to produce fine-grained HTML renderers based on CommonMark's grammar Mar 18, 2026
@Witiko Witiko added this to the 3.15.0 milestone Mar 18, 2026
Copy link
Copy Markdown
Owner

@Witiko Witiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@herley-shaori: I found a few other issues that still need to be addressed. If you have the time and would like to, please feel free to take care of them; otherwise, I can finish them myself.

Comment thread tests/support/keyval-setup.tex Outdated
Comment thread tests/support/keyval-setup.tex
Comment thread tests/support/keyval-setup.tex Outdated
Comment thread markdown.dtx Outdated
Comment thread markdown.dtx Outdated
Comment thread CHANGES.md Outdated
Comment thread markdown.dtx Outdated
@Witiko Witiko merged commit 5474fb8 into Witiko:main Apr 2, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conversion output Related to the output format of the Markdown-to-TeX conversion lua Related to the Lua interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants