Skip to content

Add extensible content parser with Markpub support#3

Closed
pfefferle wants to merge 5 commits intotrunkfrom
add/content-parser
Closed

Add extensible content parser with Markpub support#3
pfefferle wants to merge 5 commits intotrunkfrom
add/content-parser

Conversation

@pfefferle
Copy link
Copy Markdown
Member

Summary

  • Adds a pluggable content parser system for the content union field in site.standard.document records
  • Ships a Markpub parser (at.markpub.markdown) as the default, which walks Gutenberg blocks via parse_blocks() and converts each to CommonMark markdown
  • Adds a ?atproto preview endpoint on singular posts to inspect the document record JSON (requires edit_posts capability)

Extensibility

Three filters provide full control:

Filter Purpose
atmosphere_content_parser Swap the parser instance or return null to disable
atmosphere_document_content Modify the parsed content object after parsing
atmosphere_html_to_markdown Override the markdown conversion inside Markpub

Custom parsers implement Atmosphere\Content_Parser\Content_Parser and return any AT Protocol content format.

Block types handled

Paragraph, heading (with level), image (with alt/caption), list (ordered/unordered), quote, code, preformatted, separator, group/columns containers, plus fallback for unknown blocks and classic editor content.

Test plan

  • Verify ?atproto preview on a published post returns valid JSON with content field
  • Verify ?atproto is not accessible when logged out
  • Verify block content converts to markdown correctly (headings, links, images, lists, code blocks)
  • Verify classic editor content falls back to inline HTML-to-markdown conversion
  • Verify atmosphere_content_parser filter with __return_null disables the content field
  • Run npm run env-test — all 39 tests pass

@pfefferle pfefferle added the enhancement New feature label Mar 21, 2026
@pfefferle pfefferle self-assigned this Mar 21, 2026
@pfefferle pfefferle force-pushed the add/content-parser branch 2 times, most recently from a17857b to c5a534b Compare March 21, 2026 11:46
@github-actions github-actions bot added [Feature] Content Parser Content parser for AT Protocol [Feature] Transformer AT Protocol record transformers [Tests] Includes Tests PR includes test changes [Feature] Integrations Third-party plugin integrations Docs labels Mar 21, 2026
@pfefferle pfefferle force-pushed the add/content-parser branch 2 times, most recently from ed80bb4 to 8ab66d1 Compare March 21, 2026 13:10
Introduce a pluggable content parser system for the `content` union
field in site.standard.document records. The parser converts WordPress
block content into structured AT Protocol content formats.

- Add Content_Parser interface for custom parser implementations.
- Add Markpub parser (at.markpub.markdown) that walks Gutenberg blocks
  via parse_blocks() and converts each to CommonMark markdown.
- Wire content parsing into Document transformer with two filters:
  atmosphere_content_parser (swap/disable parser) and
  atmosphere_document_content (modify parsed output).
- Add ?atproto query param preview endpoint for inspecting the
  document record JSON (requires edit_posts capability).
- Add tests for all supported block types and filter integration.
Suppress pre-existing PHPCS warnings (direct DB query in uninstall,
reserved keyword parameter in Base). Use load_template() for meta
box to properly pass $post via $args.
- Always set the required `site` field in document records, falling
  back to the site URL when no publication record exists.
- Change flavor from `commonmark` to `gfm` to accurately reflect
  the strikethrough extension used in output.
- Add `extensions` field listing supported GFM extensions.
@pfefferle pfefferle force-pushed the add/content-parser branch from 8ab66d1 to 303e6e1 Compare March 21, 2026 13:14
The integrations directory lives on trunk; this branch should not
reference it in the classmap or Atmosphere init.
@pfefferle
Copy link
Copy Markdown
Member Author

Superseded by #8 (content parser interface) and #9 (Markpub parser), which split this PR into two focused PRs as discussed.

@pfefferle pfefferle closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docs enhancement New feature [Feature] Content Parser Content parser for AT Protocol [Feature] Integrations Third-party plugin integrations [Feature] Transformer AT Protocol record transformers [Tests] Includes Tests PR includes test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant