Awesome Pages Parser

awesome-pages/parser is a modular TypeScript pipeline that converts Markdown-based awesome lists into structured JSON and other reusable artifacts.

It transforms one or multiple README.md files into machine-readable formats — making them ready for:

Static sites (Jamstack)
Local or client-side search
Bookmark import/export
Feed generation (RSS / JSON Feed)
SEO tools and sitemaps

Installation

Install the package from NPM:

npm install @awesome-pages/parser

Or using pnpm:

pnpm add @awesome-pages/parser

Or using yarn:

yarn add @awesome-pages/parser

Quick Start

Basic Usage

Parse a markdown file and generate artifacts:

import { parse } from '@awesome-pages/parser';

const results = await parse({
  sources: [
    {
      from: 'path/to/awesome-list.md',
      outputs: [
        {
          artifact: 'domain',
          to: 'output/domain.json',
        },
        {
          artifact: 'bookmarks',
          to: 'output/bookmarks.html',
        },
      ],
    },
  ],
});

console.log(`Generated ${results.length} artifacts`);

Parse from GitHub

Parse directly from a GitHub repository:

import { parse } from '@awesome-pages/parser';

await parse({
  githubToken: process.env.GITHUB_TOKEN, // optional, for higher rate limits
  sources: [
    {
      from: 'github://sindresorhus/awesome@main:README.md',
      outputs: [
        {
          artifact: ['domain', 'index', 'bookmarks'],
          to: 'dist/{repo}.{artifact}.{ext}',
        },
      ],
    },
  ],
});

Generate Artifacts Programmatically

Use individual artifact generators:

import { 
  parse, 
  generateBookmarksHtml, 
  buildIndex 
} from '@awesome-pages/parser';

// Parse to get domain object
const results = await parse({
  sources: [
    {
      from: 'awesome.md',
      outputs: [{ artifact: 'domain', to: 'domain.json' }],
    },
  ],
});

// Load the domain JSON
import { readFile } from 'fs/promises';
const domain = JSON.parse(await readFile('domain.json', 'utf-8'));

// Generate bookmarks HTML
const bookmarksHtml = generateBookmarksHtml(domain);

// Build search index
const searchIndex = buildIndex(domain);

Generate JSON Schema

Get the TypeScript-generated JSON Schema for the Domain v1 format:

import { generateDomainV1JsonSchema } from '@awesome-pages/parser';

const schema = generateDomainV1JsonSchema();
console.log(JSON.stringify(schema, null, 2));

Overview

The parser reads Markdown and outputs a validated domain model (DomainV1) via Zod. From that core model, multiple artifacts can be generated — each designed for a different consumer.

README.md
   ↓ parse()
DomainV1 JSON
   ↓ artifacts
 ├── index.json      (inverted index for search)
 ├── bookmarks.html  (browser import)
 ├── sitemap.xml     (SEO discovery)
 ├── rss.json        (modern JSON Feed)
 └── rss.xml         (classic RSS 2.0)

Architecture Diagram

flowchart LR
    %% === NODES ===
    subgraph Input["Input Sources"]
        A1["Local README.md"]
        A2["GitHub (via API)"]
        A3["HTTP(S) Remote URL"]
    end

    subgraph Core["Parser Core"]
        B1["markdownToAst()"]
        B2["extractMetadata()"]
        B3["mdastToDomain()"]
        B4["validate(DomainV1Schema)"]
    end

    subgraph Outputs["Output Artifacts"]
        C1["domain.json"]
        C2["index.json"]
        C3["bookmarks.html"]
        C4["sitemap.xml"]
        C5["rss.json"]
        C6["rss.xml"]
        C7["data.csv"]
    end

    %% === FLOW ===
    A1 & A2 & A3 --> B1 --> B2 --> B3 --> B4 --> C1 & C2 & C3 & C4 & C5 & C6 & C7

    %% === STYLING ===
    classDef input fill:#E3F2FD,stroke:#2196F3,stroke-width:2px,color:#0D47A1;
    classDef core fill:#E8F5E9,stroke:#4CAF50,stroke-width:2px,color:#1B5E20;
    classDef output fill:#FFF8E1,stroke:#FFC107,stroke-width:2px,color:#795548;

    class A1,A2,A3 input;
    class B1,B2,B3,B4 core;
    class C1,C2,C3,C4,C5,C6,C7 output;

Examples

Example README.md files are available in the src/tests/fixtures/readmes/ directory. You can test the parser on them, e.g.:

tsx src/cli.ts src/tests/fixtures/readmes/awesome-click-and-use.md output.json

Available Artifacts

The parser can generate multiple types of output artifacts:

1. `domain` (JSON)

The complete domain model with all metadata, sections, and items in a structured JSON format.

2. `index` (JSON)

A simplified index of the content, useful for building navigation or search functionality.

3. `bookmarks` (HTML)

A browser-compatible bookmarks file in the Netscape Bookmark File Format. Can be imported directly into Chrome, Firefox, Edge, and other modern browsers.

4. `sitemap` (XML)

An XML sitemap following the Sitemap Protocol. Includes all items with valid URLs and can be submitted to search engines like Google and Bing for better indexing.

5. `rss-json` (JSON Feed)

A feed in JSON Feed v1.1 format. Modern, JSON-based alternative to RSS/Atom, easier to parse in JavaScript applications. Each item with a URL becomes a feed entry.

6. `rss-xml` (RSS 2.0)

A classic RSS 2.0 XML feed compatible with traditional feed readers like Feedly, Inoreader, and Thunderbird. Each item with a URL becomes a feed entry.

7. `csv` (CSV)

A comma-separated values file containing all items and their metadata. Ideal for spreadsheet applications, data analysis, or importing into databases.

API Reference

`parse(options: ParseOptions): Promise<ParseResultFile[]>`

Main entry point for parsing awesome lists and generating artifacts.

Parameters:

options.sources: Array of source specifications
options.githubToken: Optional GitHub token for API access
options.cache: Enable/disable caching (default: true)
options.cachePath: Custom cache directory
options.concurrency: Number of concurrent operations
options.strict: Fail on validation errors

Returns: Array of generated files with metadata

`generateDomainV1JsonSchema()`

Generates the JSON Schema definition for the Domain v1 format.

`generateBookmarksHtml(domain: DomainV1): string`

Converts a domain object into browser-compatible bookmarks HTML.

`buildIndex(domain: DomainV1): SearchIndex`

Builds an inverted search index from a domain object.

TypeScript Support

The library is written in TypeScript and includes complete type definitions. All types are exported for your convenience:

import type { 
  DomainV1, 
  SectionV1, 
  ItemV1,
  ParseOptions,
  SourceSpec,
  OutputTarget,
  Artifact,
  ParseResultFile,
  SearchIndex
} from '@awesome-pages/parser';

// Use types in your code
const source: SourceSpec = {
  from: 'awesome.md',
  outputs: [
    {
      artifact: 'domain',
      to: 'output.json'
    }
  ]
};

// Domain model types
const section: SectionV1 = {
  id: 'tools',
  title: 'Tools',
  parentId: null,
  depth: 1,
  order: 0,
  path: 'tools',
  descriptionHtml: null
};

Advanced Usage Example

import { parse } from '@awesome-pages/parser';

await parse({
  sources: [
    {
      from: ['github://user/repo@main:README.md'],
      outputs: [
        {
          artifact: ['domain', 'index'],
          to: 'dist/{repo}.{artifact}.json',
        },
        {
          artifact: 'bookmarks',
          to: 'dist/{repo}.bookmarks.html',
        },
        {
          artifact: 'sitemap',
          to: 'dist/{repo}.sitemap.xml',
        },
        {
          artifact: 'rss-json',
          to: 'dist/{repo}.rss.json',
        },
        {
          artifact: 'rss-xml',
          to: 'dist/{repo}.rss.xml',
        },
      ],
    },
  ],
});

Development

Local Development

# Clone the repository
git clone https://github.com/awesome-pages/parser.git
cd parser

# Install dependencies
pnpm install

# Run tests
pnpm test

# Run tests in watch mode
pnpm run dev:test

# Build the library
pnpm build

# Lint and format
pnpm run lint
pnpm run format

Scripts

pnpm test — runs the tests (Vitest)
pnpm build — builds ESM and CJS bundles with TypeScript declarations
pnpm run dev:test — runs tests in watch mode
pnpm parse — runs CLI: tsx src/cli.ts src/tests/fixtures/readmes/awesome-click-and-use.md readme.domain.json
pnpm run lint — lints code with Biome
pnpm run format — formats code with Biome

Publishing

This package uses semantic-release for automated versioning and publishing. When commits are merged to the main branch:

Commit messages are analyzed to determine the version bump (major/minor/patch)
CHANGELOG.md is automatically generated
Package version is bumped in package.json
GitHub release is created with release notes
Package is published to NPM

Commit Message Format:

Follow Conventional Commits:

feat: — new feature (minor version bump)
fix: — bug fix (patch version bump)
feat!: or BREAKING CHANGE: — breaking change (major version bump)
docs:, chore:, style:, refactor:, perf:, test: — no version bump

Example:

git commit -m "feat: add support for custom cache strategies"
git commit -m "fix: handle malformed markdown sections"
git commit -m "feat!: change parse() API to accept options object"

Part of the Awesome Pages ecosystem

This parser powers the Awesome Pages toolchain:

awesome-pages/parser: converts Markdown to structured data
awesome-pages/site: static site generator using parser artifacts
awesome-pages/schema: publishes JSON Schema definitions for validation and interoperability

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github/workflows		.github/workflows
src		src
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
.nvmrc		.nvmrc
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
generate-schema.ts		generate-schema.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
readme.domain.json		readme.domain.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

License

awesome-pages/parser

Folders and files

Latest commit

History

Repository files navigation

Awesome Pages Parser

Installation

Quick Start

Basic Usage

Parse from GitHub

Generate Artifacts Programmatically

Generate JSON Schema

Overview

Architecture Diagram

Examples

Available Artifacts

1. domain (JSON)

2. index (JSON)

3. bookmarks (HTML)

4. sitemap (XML)

5. rss-json (JSON Feed)

6. rss-xml (RSS 2.0)

7. csv (CSV)

API Reference

parse(options: ParseOptions): Promise<ParseResultFile[]>

generateDomainV1JsonSchema()

generateBookmarksHtml(domain: DomainV1): string

buildIndex(domain: DomainV1): SearchIndex

TypeScript Support

Advanced Usage Example

Development

Local Development

Scripts

Publishing

Part of the Awesome Pages ecosystem

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

1. `domain` (JSON)

2. `index` (JSON)

3. `bookmarks` (HTML)

4. `sitemap` (XML)

5. `rss-json` (JSON Feed)

6. `rss-xml` (RSS 2.0)

7. `csv` (CSV)

`parse(options: ParseOptions): Promise<ParseResultFile[]>`

`generateDomainV1JsonSchema()`

`generateBookmarksHtml(domain: DomainV1): string`

`buildIndex(domain: DomainV1): SearchIndex`

Packages