Skip to content

Conversation

@afonsojramos
Copy link
Contributor

Summary

I've made some needed improvements to the Markdown parser's structure and error reporting. The main issues I tackled were:

  1. Reorganized the parser code with separate modules for each block type
  2. Fixed diagnostic messages for headers without spaces after hash symbols
  3. Made error handling more consistent
  4. Added a README with usage examples

Before these changes, the parser couldn't properly report when someone wrote "#Header" instead of "# Header". Now it catches these errors with the right position info and a clear message. The new structure should also make it easier for others to work on the parser going forward.

Test Plan

The implementation correctness is demonstrated by:

✅ Tests passing
✅ Invalid header reporting diagnostics
✅ Some manual testing

To verify:

  • Run cargo test -p biome_markdown_parser to confirm all tests pass
  • Check that the invalid_header.md test properly emits the diagnostic "Invalid header format: missing space after '#'"
  • Try parsing markdown with invalid headers and verify the error message and location are correct

…ens and kinds

- Rename `DemoSyntaxTreeBuilder` to `MarkdownSyntaxTreeBuilder` in markdown factory
- Extend `MarkdownSyntaxKind` with new tokens for tables, lists, blockquotes, and delimiters
- Add utility functions in `make.rs` to create various Markdown syntax tokens
- Update macro rules to include new token types
- Introduce `markdown` module in configuration crate
- Add `MarkdownConfiguration` struct with formatter, linter, and parser options
- Extend `Configuration` struct to include optional Markdown configuration
- Implement default configuration and configuration builder for Markdown
- Update `LintCommandPayload` to include Markdown linter configuration
- Modify CLI session to handle Markdown linter in command processing
- Add Markdown linter configuration import and merge logic in command runner
@github-actions github-actions bot added A-CLI Area: CLI A-Project Area: project A-Parser Area: parser labels Mar 7, 2025
…kdown block types

- Enhance parsing logic for blockquotes, headers, list items, and thematic breaks
- Add more robust whitespace and marker handling
- Improve continuation line parsing for list items
- Implement more precise checkpoint and rewind mechanisms
- Add better error recovery and validation for different block types
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 7, 2025

CodSpeed Performance Report

Merging #5292 will not alter performance

Comparing afonsojramos:feat/markdown-parser (0222689) with main (08de81d)

Summary

✅ 95 untouched benchmarks

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution! I haven't reviewed the logic of the parser because there are other important things to change. Briefly:

  • revert snapshots that caused regressions
  • remove files that cause exptions
  • do not expose any configuration to our users yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit conflicted about accepting the parsing of headers because there was another contributor who was working on it before you: #5208

I wish you could have engaged with us before jumping straight to developments. Let's see how it goes, since this PR needs some reviews.

Comment on lines 144 to 149
// For now, update snapshots automatically
insta::with_settings!({
prepend_module_to_snapshot => false,
snapshot_path => &test_directory,
// Auto-accept new snapshots during development
input_file => test_case_path.to_string_lossy().to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, let's avoid this. It's easier to accept a snapshot with another CLI command than figuring out why a snapshot was updated, because we might forget to update this code.

@dyc3 dyc3 self-requested a review March 7, 2025 19:03
…figuration imports

- Delete markdown parser test files including `parser_test.rs` and `spec_test.rs`
- Remove unused Markdown configuration import from CLI commands module
- Update spec test to use more robust debugging and tree structure printing
- Remove test files for invalid markdown headers
@github-actions github-actions bot removed the A-Project Area: project label Mar 12, 2025
…t validation

- Add direct support for thematic break literal tokens
- Enhance thematic break block parsing to handle whitespace and marker variations
- Implement stricter validation to prevent invalid thematic break content
- Add comments to clarify thematic break detection logic
…ine handling

- Refactor blockquote parsing to handle nested blockquotes and continuation lines
- Improve line parsing logic to support more complex blockquote structures
- Simplify content node creation and add more robust parsing mechanism
- Enhance end-of-blockquote detection with better line and EOF handling
…ed logic

- Simplify header block parsing by removing redundant diagnostic creation
- Enhance whitespace handling with a more concise consumption method
- Remove explicit text node creation for header content
- Improve header validation and parsing robustness
…rkers and thematic breaks

- Enhance lexer to handle list markers for stars, minus, digits, and underscores
- Implement more robust thematic break detection with flexible marker parsing
- Separate parsing logic for different marker types to improve code clarity
- Add support for various list marker formats and whitespace handling
- Simplify document parsing by removing unnecessary checks and improving block handling
- Introduce a checkpoint mechanism for better fallback to paragraph parsing
- Streamline block element recognition and improve error handling for unrecognized tokens
- Add support for '+' as a list marker and enhance overall lexer functionality
- Update tests to allow for development diagnostics and improve tree structure printing
… function names and logic

- Rename functions for clarity: `at_header_block` to `at_atx_header` and `parse_header_block` to `parse_atx_header`
- Improve header validation by ensuring whitespace follows hash symbols
- Streamline content parsing logic for ATX headers, including handling of trailing hashes
- Add a new function `at_setext_header` for future Setext header support
…alidation and diagnostics

- Introduce detailed logging for whitespace and marker checks in unordered and ordered list parsing
- Refactor list item parsing functions to include parent indentation tracking
- Improve whitespace handling after list markers and ensure valid content parsing
- Streamline continuation line handling for list items and enhance overall structure
- Consolidate paragraph parsing by removing redundant checks and simplifying inline element handling
- Introduce a checkpoint mechanism to improve content validation and error handling
- Enhance whitespace management and ensure proper handling of leading spaces and newlines
- Create a more efficient structure for paragraph item list creation and completion
@github-actions github-actions bot added the A-Tooling Area: internal tools label Mar 15, 2025
@afonsojramos
Copy link
Contributor Author

Closing this as I no longer have the time availability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-CLI Area: CLI A-Parser Area: parser A-Tooling Area: internal tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants