Skip to content

PROJECT: myst-lint - MyST Markdown Linting Tool #268

@mmcky

Description

@mmcky

Summary

A standalone MyST Markdown linting tool that validates syntax and provides deterministic error detection for MyST documents. This tool would support the translation sync pipeline and can be used independently for quality assurance across all QuantEcon lecture repositories.

Background

During evaluation of the action-translation-sync tool (v0.6.0), human reviewer @HumphreyYang identified markdown syntax errors in translator output that were not caught:

  • Missing space after #### in headings (e.g., ####Title instead of #### Title)
  • Incorrect code block delimiters
  • Math block delimiter mismatches

While we have added LLM-based syntax checking to prompts as a first line of defense, a deterministic linting tool would provide:

  1. 100% reliable detection (vs ~90% from LLM)
  2. No API costs
  3. Fast validation
  4. CI/pre-commit integration

Proposed Solution

Build myst-lint as a standalone tool that wraps markdownlint (5.3M monthly downloads, actively maintained) with MyST-specific extensions.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                       myst-lint                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────┐   ┌─────────────────────────────────┐  │
│  │  markdownlint   │ + │  MyST Custom Rules (optional)   │  │
│  │  (core rules)   │   │  - directive validation         │  │
│  │  - MD018 (ATX)  │   │  - role validation              │  │
│  │  - MD031 (code) │   │  - math delimiter matching      │  │
│  │  - MD040 (lang) │   │  - code-cell validation         │  │
│  └─────────────────┘   └─────────────────────────────────┘  │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│  CLI: myst-lint <file.md> [options]                         │
│  API: import { lintMyST } from "myst-lint"                  │
└─────────────────────────────────────────────────────────────┘

Integration Points

Source Repo ──► [myst-lint] ──► Valid input guaranteed
                    │
                    ▼
                Translator ──► [myst-lint] ──► Block sync if errors
                                    │
                                    ▼
Target Repo PR ◄── Sync ◄── Valid translated content
        │
        └──► [Evaluator] ──► Failsafe LLM check (advisory)
  1. Source validation: Lint English source before translation (assume valid input)
  2. Output validation: Lint translator output before sync (catch translator-introduced errors)
  3. CI integration: Pre-commit hooks for lecture repositories
  4. Evaluator failsafe: LLM-based check as backup (already implemented)

Core markdownlint Rules (Built-in)

These rules work out of the box with MyST:

Rule Description Catches
MD018 No space after hash on ATX heading ####Title → error
MD031 Fenced code blocks surrounded by blank lines Structure issues
MD040 Fenced code blocks should have language Missing language spec
MD047 Files should end with single newline Formatting

MyST Custom Rules (Phase 2)

Extend markdownlint with MyST-specific validation:

Rule Description
myst-directive-known Validate {directive-name} is recognized
myst-directive-options Check :option: value syntax
myst-role-syntax Validate {role}\target`` format
myst-math-delimiters Check $$ pairs are balanced
myst-code-cell-tags Validate code-cell tag syntax

Implementation Plan

Phase 1: Core Tool (MVP)

  • Wrap markdownlint with MyST-friendly config
  • Disable rules that conflict with MyST syntax
  • CLI and programmatic API
  • Integration with action-translation-sync

Phase 2: MyST Custom Rules

  • Implement directive/role validation
  • Math delimiter checking
  • Code-cell validation

Phase 3: CI Integration

  • Pre-commit hooks
  • GitHub Actions workflow
  • VS Code extension recommendations

Technical Details

Base Package: markdownlint v0.39.0

  • 5.3M monthly downloads
  • Actively maintained (last release Oct 2025)
  • Supports custom rules via micromark parser
  • Works with MyST out of the box (no false positives on directives)

Testing Performed:

$ echo "####NoSpaceHeading" | npx markdownlint-cli --stdin
stdin:1:1 MD018/no-missing-space-atx No space after hash on atx style heading

MyST directives ({note}, {code-cell}, etc.) are correctly ignored as valid code blocks.

Priority

MEDIUM - The LLM-based syntax checking provides good coverage now. This tool would provide deterministic guarantees and enable broader use across the organization.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions