Skip to content

SNApp-notes/markdown-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@snapp-notes/markdown-parser

NPM Version Test Coverage Status LICENSE MIT

Simple Markdown Parser that returns an Abstract Syntax Tree (AST) with location information.

Installation

npm install @snapp-notes/markdown-parser

Features

  • 📝 Parse markdown into a structured AST
  • 📍 Location tracking for every node
  • 🎯 Support for common markdown elements:
    • Headers (H1-H6)
    • Code blocks with language specification
    • Bold text (** and __)
    • Italic text (* and _)
    • Inline links
    • List items
    • Plain text
  • 🚀 Built with PEG.js/Peggy for reliable parsing
  • 📦 ES Module support
  • đź’Ş TypeScript definitions included

Usage

Basic Example

import { parse } from '@snapp-notes/markdown-parser';

const markdown = '# Hello World\nThis is **bold** text.';
const ast = parse(markdown);

console.log(ast);

Output:

[
  {
    type: 'header',
    content: '# Hello World',
    level: 1,
    loc: { start: { offset: 0, line: 1, column: 1 }, end: { ... } }
  },
  {
    type: 'text',
    content: '\n',
    loc: { ... }
  },
  {
    type: 'text',
    content: 'This is '
  },
  {
    type: 'bold',
    content: '**bold**',
    loc: { ... }
  },
  {
    type: 'text',
    content: ' text.'
  }
]

Parsing Headers

import { parse } from '@snapp-notes/markdown-parser';

const ast = parse('# H1\n## H2\n### H3');

// Each header node contains:
// - type: 'header'
// - content: full header text including # symbols
// - level: number (1-6)
// - loc: location information

Parsing Code Blocks

import { parse } from '@snapp-notes/markdown-parser';

const markdown = `\`\`\`javascript
const greeting = "Hello";
console.log(greeting);
\`\`\``;

const ast = parse(markdown);

// Code node contains:
// - type: 'code'
// - content: code content (includes leading newline)
// - language: 'javascript' (or empty string if not specified)
// - loc: location information

Parsing Inline Formatting

import { parse } from '@snapp-notes/markdown-parser';

// Bold text
parse('**bold text**');  // or '__bold text__'

// Italic text
parse('*italic text*');  // or '_italic text_'

// Mixed formatting
const ast = parse('This is **bold** and *italic* text');

Parsing Links

import { parse } from '@snapp-notes/markdown-parser';

const ast = parse('[Google](https://google.com)');

// Link node contains:
// - type: 'link'
// - text: 'Google'
// - url: 'https://google.com'
// - content: '[Google](https://google.com)'
// - loc: location information

Parsing Lists

import { parse } from '@snapp-notes/markdown-parser';

const markdown = `* Item 1
* Item 2
* Item 3`;

const ast = parse(markdown);

// List nodes contain:
// - type: 'list'
// - content: '* Item text'
// - loc: location information

Complex Document

import { parse } from '@snapp-notes/markdown-parser';

const markdown = `# My Document

This is a paragraph with **bold** and *italic* text.

Visit [my website](https://example.com) for more info.

\`\`\`python
def hello():
    print("Hello, World!")
\`\`\`

* Feature 1
* Feature 2
`;

const ast = parse(markdown);

// The AST will contain a mix of different node types
ast.forEach(node => {
  console.log(`${node.type}: ${node.content?.substring(0, 30)}...`);
});

API

parse(input: string, options?: { startRule?: string }): MarkdownNode[]

Parses a markdown string and returns an array of AST nodes.

Parameters:

  • input (string): The markdown text to parse
  • options (optional): Parser options
    • startRule (optional): The grammar rule to start parsing from (default: 'start')

Returns: An array of MarkdownNode objects

Throws: SyntaxError if the input cannot be parsed

Node Types

TextNode

interface TextNode {
  type: 'text' | 'bold' | 'italic' | 'list';
  content: string;
  loc: Location;
}

Used for plain text, bold text, italic text, and list items.

HeaderNode

interface HeaderNode {
  type: 'header';
  content: string;
  level: number;  // 1-6
  loc: Location;
}

CodeNode

interface CodeNode {
  type: 'code';
  content: string;
  language?: string;
  loc: Location;
}

Note: The content includes a leading newline character.

LinkNode

interface LinkNode {
  type: 'link';
  text: string;
  url: string;
  content: string;
  loc: Location;
}

Location

interface Location {
  start: Position;
  end: Position;
}

interface Position {
  offset: number;  // Character offset from start
  line: number;    // Line number (1-based)
  column: number;  // Column number (1-based)
}

Supported Markdown Syntax

Element Syntax Example
Header # to ###### # Title
Bold **text** or __text__ **bold**
Italic *text* or _text_ *italic*
Link [text](url) [Google](https://google.com)
Code Block ```lang\ncode\n``` ```js\ncode\n```
List Item * item * Item 1

Limitations

  • Nested formatting (e.g., bold within italic) is not fully supported
  • Only unordered lists with * are supported
  • No support for:
    • Blockquotes
    • Tables
    • Images
    • Horizontal rules
    • Strikethrough
    • Task lists

Development

Build

Generate the parser from the grammar file:

npm run build

Testing

Run the test suite:

npm test

Watch mode for development:

npm run test:watch

Grammar

The parser is built using Peggy (formerly PEG.js). The grammar file is located at src/grammar.peggy.

To modify the parser, edit the grammar file and rebuild:

npm run build

Contributing

Contributions are welcome! Please ensure all tests pass before submitting a pull request.

npm run build
npm test

License

Copyright (c) 2025 Jakub T. Jankiewicz

Released under MIT license

About

Simple Markdown parser that return AST

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published