awesome-pages/parser is a modular TypeScript pipeline that converts Markdown-based awesome lists into structured JSON and other reusable artifacts.
It transforms one or multiple README.md files into machine-readable formats — making them ready for:
- Static sites (Jamstack)
- Local or client-side search
- Bookmark import/export
- Feed generation (RSS / JSON Feed)
- SEO tools and sitemaps
Install the package from NPM:
npm install @awesome-pages/parserOr using pnpm:
pnpm add @awesome-pages/parserOr using yarn:
yarn add @awesome-pages/parserParse a markdown file and generate artifacts:
import { parse } from '@awesome-pages/parser';
const results = await parse({
sources: [
{
from: 'path/to/awesome-list.md',
outputs: [
{
artifact: 'domain',
to: 'output/domain.json',
},
{
artifact: 'bookmarks',
to: 'output/bookmarks.html',
},
],
},
],
});
console.log(`Generated ${results.length} artifacts`);Parse directly from a GitHub repository:
import { parse } from '@awesome-pages/parser';
await parse({
githubToken: process.env.GITHUB_TOKEN, // optional, for higher rate limits
sources: [
{
from: 'github://sindresorhus/awesome@main:README.md',
outputs: [
{
artifact: ['domain', 'index', 'bookmarks'],
to: 'dist/{repo}.{artifact}.{ext}',
},
],
},
],
});Use individual artifact generators:
import {
parse,
generateBookmarksHtml,
buildIndex
} from '@awesome-pages/parser';
// Parse to get domain object
const results = await parse({
sources: [
{
from: 'awesome.md',
outputs: [{ artifact: 'domain', to: 'domain.json' }],
},
],
});
// Load the domain JSON
import { readFile } from 'fs/promises';
const domain = JSON.parse(await readFile('domain.json', 'utf-8'));
// Generate bookmarks HTML
const bookmarksHtml = generateBookmarksHtml(domain);
// Build search index
const searchIndex = buildIndex(domain);Get the TypeScript-generated JSON Schema for the Domain v1 format:
import { generateDomainV1JsonSchema } from '@awesome-pages/parser';
const schema = generateDomainV1JsonSchema();
console.log(JSON.stringify(schema, null, 2));The parser reads Markdown and outputs a validated domain model (DomainV1) via Zod.
From that core model, multiple artifacts can be generated — each designed for a different consumer.
README.md
↓ parse()
DomainV1 JSON
↓ artifacts
├── index.json (inverted index for search)
├── bookmarks.html (browser import)
├── sitemap.xml (SEO discovery)
├── rss.json (modern JSON Feed)
└── rss.xml (classic RSS 2.0)
flowchart LR
%% === NODES ===
subgraph Input["Input Sources"]
A1["Local README.md"]
A2["GitHub (via API)"]
A3["HTTP(S) Remote URL"]
end
subgraph Core["Parser Core"]
B1["markdownToAst()"]
B2["extractMetadata()"]
B3["mdastToDomain()"]
B4["validate(DomainV1Schema)"]
end
subgraph Outputs["Output Artifacts"]
C1["domain.json"]
C2["index.json"]
C3["bookmarks.html"]
C4["sitemap.xml"]
C5["rss.json"]
C6["rss.xml"]
C7["data.csv"]
end
%% === FLOW ===
A1 & A2 & A3 --> B1 --> B2 --> B3 --> B4 --> C1 & C2 & C3 & C4 & C5 & C6 & C7
%% === STYLING ===
classDef input fill:#E3F2FD,stroke:#2196F3,stroke-width:2px,color:#0D47A1;
classDef core fill:#E8F5E9,stroke:#4CAF50,stroke-width:2px,color:#1B5E20;
classDef output fill:#FFF8E1,stroke:#FFC107,stroke-width:2px,color:#795548;
class A1,A2,A3 input;
class B1,B2,B3,B4 core;
class C1,C2,C3,C4,C5,C6,C7 output;
Example README.md files are available in the src/tests/fixtures/readmes/ directory. You can test the parser on them, e.g.:
tsx src/cli.ts src/tests/fixtures/readmes/awesome-click-and-use.md output.jsonThe parser can generate multiple types of output artifacts:
The complete domain model with all metadata, sections, and items in a structured JSON format.
A simplified index of the content, useful for building navigation or search functionality.
A browser-compatible bookmarks file in the Netscape Bookmark File Format. Can be imported directly into Chrome, Firefox, Edge, and other modern browsers.
An XML sitemap following the Sitemap Protocol. Includes all items with valid URLs and can be submitted to search engines like Google and Bing for better indexing.
A feed in JSON Feed v1.1 format. Modern, JSON-based alternative to RSS/Atom, easier to parse in JavaScript applications. Each item with a URL becomes a feed entry.
A classic RSS 2.0 XML feed compatible with traditional feed readers like Feedly, Inoreader, and Thunderbird. Each item with a URL becomes a feed entry.
A comma-separated values file containing all items and their metadata. Ideal for spreadsheet applications, data analysis, or importing into databases.
Main entry point for parsing awesome lists and generating artifacts.
Parameters:
options.sources: Array of source specificationsoptions.githubToken: Optional GitHub token for API accessoptions.cache: Enable/disable caching (default: true)options.cachePath: Custom cache directoryoptions.concurrency: Number of concurrent operationsoptions.strict: Fail on validation errors
Returns: Array of generated files with metadata
Generates the JSON Schema definition for the Domain v1 format.
Converts a domain object into browser-compatible bookmarks HTML.
Builds an inverted search index from a domain object.
The library is written in TypeScript and includes complete type definitions. All types are exported for your convenience:
import type {
DomainV1,
SectionV1,
ItemV1,
ParseOptions,
SourceSpec,
OutputTarget,
Artifact,
ParseResultFile,
SearchIndex
} from '@awesome-pages/parser';
// Use types in your code
const source: SourceSpec = {
from: 'awesome.md',
outputs: [
{
artifact: 'domain',
to: 'output.json'
}
]
};
// Domain model types
const section: SectionV1 = {
id: 'tools',
title: 'Tools',
parentId: null,
depth: 1,
order: 0,
path: 'tools',
descriptionHtml: null
};import { parse } from '@awesome-pages/parser';
await parse({
sources: [
{
from: ['github://user/repo@main:README.md'],
outputs: [
{
artifact: ['domain', 'index'],
to: 'dist/{repo}.{artifact}.json',
},
{
artifact: 'bookmarks',
to: 'dist/{repo}.bookmarks.html',
},
{
artifact: 'sitemap',
to: 'dist/{repo}.sitemap.xml',
},
{
artifact: 'rss-json',
to: 'dist/{repo}.rss.json',
},
{
artifact: 'rss-xml',
to: 'dist/{repo}.rss.xml',
},
],
},
],
});# Clone the repository
git clone https://github.com/awesome-pages/parser.git
cd parser
# Install dependencies
pnpm install
# Run tests
pnpm test
# Run tests in watch mode
pnpm run dev:test
# Build the library
pnpm build
# Lint and format
pnpm run lint
pnpm run formatpnpm test— runs the tests (Vitest)pnpm build— builds ESM and CJS bundles with TypeScript declarationspnpm run dev:test— runs tests in watch modepnpm parse— runs CLI:tsx src/cli.ts src/tests/fixtures/readmes/awesome-click-and-use.md readme.domain.jsonpnpm run lint— lints code with Biomepnpm run format— formats code with Biome
This package uses semantic-release for automated versioning and publishing. When commits are merged to the main branch:
- Commit messages are analyzed to determine the version bump (major/minor/patch)
- CHANGELOG.md is automatically generated
- Package version is bumped in package.json
- GitHub release is created with release notes
- Package is published to NPM
Commit Message Format:
Follow Conventional Commits:
feat:— new feature (minor version bump)fix:— bug fix (patch version bump)feat!:orBREAKING CHANGE:— breaking change (major version bump)docs:,chore:,style:,refactor:,perf:,test:— no version bump
Example:
git commit -m "feat: add support for custom cache strategies"
git commit -m "fix: handle malformed markdown sections"
git commit -m "feat!: change parse() API to accept options object"This parser powers the Awesome Pages toolchain:
awesome-pages/parser: converts Markdown to structured dataawesome-pages/site: static site generator using parser artifactsawesome-pages/schema: publishes JSON Schema definitions for validation and interoperability
MIT