markit

Convert anything to markdown. PDF, DOCX, PPTX, XLSX, HTML, EPUB, Jupyter, RSS, images, audio, URLs, and more. Pluggable converters, built-in LLM providers for image description and audio transcription. Works as a CLI and as a library.

npm install -g markit-ai

Quick Start

# Documents
markit report.pdf
markit document.docx
markit slides.pptx

# Data
markit data.csv
markit config.json
markit schema.yaml

# Web
markit https://example.com/article
markit https://en.wikipedia.org/wiki/Markdown

# Media (via LLMs. set OPENAI_API_KEY or ANTHROPIC_API_KEY)
markit photo.jpg                          # EXIF metadata + AI description
markit recording.mp3                      # Audio metadata + transcription
markit photo.jpg -p "Extract all text"    # Custom instructions

# Write to file
markit report.pdf -o report.md

# Pipe it
markit report.pdf | pbcopy
markit data.xlsx -q | napkin create "Imported Data"

Supported Formats

Format	Extensions	How
PDF	`.pdf`	Text extraction via unpdf
Word	`.docx`	mammoth → turndown, preserves headings/tables
PowerPoint	`.pptx`	XML parsing, slides + notes + tables
Excel	`.xlsx`	Each sheet → markdown table
HTML	`.html` `.htm`	turndown, scripts/styles stripped
EPUB	`.epub`	Spine-ordered chapters, metadata header
Jupyter	`.ipynb`	Markdown cells + code + outputs
RSS/Atom	`.rss` `.atom` `.xml`	Feed items with dates and content
CSV/TSV	`.csv` `.tsv`	Markdown tables
JSON	`.json`	Pretty-printed code block
YAML	`.yaml` `.yml`	Code block
XML/SVG	`.xml` `.svg`	Code block
Images	`.jpg` `.png` `.gif` `.webp`	EXIF metadata + optional AI description
Audio	`.mp3` `.wav` `.m4a` `.flac`	Metadata + optional AI transcription
ZIP	`.zip`	Recursive. converts each file inside
URLs	`http://` `https://`	Fetches with `Accept: text/markdown`
Wikipedia	`*.wikipedia.org`	Main content extraction
Code	`.py` `.ts` `.go` `.rs` ...	Fenced code block
Plain text	`.txt` `.md` `.rst` `.log`	Pass-through

Need more? Write a plugin.

AI Features

Images and audio get metadata extraction for free. For AI-powered descriptions and transcription, set an API key:

# OpenAI (default provider)
export OPENAI_API_KEY=sk-...
markit photo.jpg

# Anthropic
markit config set llm.provider anthropic
export ANTHROPIC_API_KEY=sk-ant-...
markit photo.jpg

# Any OpenAI-compatible API (Ollama, Groq, Together, etc.)
markit config set llm.apiBase http://localhost:11434/v1

Focus the AI on what matters:

markit receipt.jpg -p "List all line items with prices as a table"
markit diagram.png -p "Describe the architecture and data flow"
markit whiteboard.jpg -p "Extract all text verbatim"

Plugins

Extend markit with new formats, override builtins, or add LLM providers.

Install

markit plugin install npm:markit-plugin-dwg
markit plugin install git:github.com/user/markit-plugin-ocr
markit plugin install ./my-plugin.ts
markit plugin list
markit plugin remove dwg

Write a Plugin

A plugin is a function that receives an API and registers converters and/or providers:

import type { MarkitPluginAPI } from "markit-ai";

export default function(api: MarkitPluginAPI) {
  api.setName("cad");
  api.setVersion("1.0.0");

  // Register a converter for a new format
  api.registerConverter(
    {
      name: "dwg",
      accepts: (info) => [".dwg", ".dxf"].includes(info.extension || ""),
      convert: async (input, info) => {
        // Your conversion logic
        return { markdown: "..." };
      },
    },
    // Optional: declare the format so it shows in `markit formats`
    { name: "AutoCAD", extensions: [".dwg", ".dxf"] },
  );
}

Plugin converters run before builtins. so you can override any format:

export default function(api: MarkitPluginAPI) {
  api.setName("better-pdf");

  // This replaces the built-in PDF converter
  api.registerConverter({
    name: "pdf",
    accepts: (info) => info.extension === ".pdf",
    convert: async (input, info) => {
      // Your superior PDF extraction
      return { markdown: "..." };
    },
  });
}

Plugins can also register LLM providers:

api.registerProvider({
  name: "gemini",
  envKeys: ["GOOGLE_API_KEY"],
  defaultBase: "https://generativelanguage.googleapis.com/v1beta",
  defaultModel: "gemini-2.0-flash",
  create: (config, prompt) => ({
    describe: async (image, mime) => { /* ... */ },
  }),
});

For Agents

Every command supports --json. Raw markdown with -q.

markit report.pdf --json       # Structured output for parsing
markit report.pdf -q           # Raw markdown, nothing else
markit onboard                 # Add instructions to CLAUDE.md

SDK

markit is also a library:

import { Markit } from "markit-ai";

const markit = new Markit();
const { markdown } = await markit.convertFile("report.pdf");
const { markdown } = await markit.convertUrl("https://example.com");
const { markdown } = await markit.convert(buffer, { extension: ".docx" });

With AI features. pass plain functions, use any provider:

import OpenAI from "openai";
import { Markit } from "markit-ai";

const openai = new OpenAI();

const markit = new Markit({
  describe: async (image, mime) => {
    const res = await openai.chat.completions.create({
      model: "gpt-4.1-nano",
      messages: [{ role: "user", content: [
        { type: "text", text: "Describe this image." },
        { type: "image_url", image_url: { url: `data:${mime};base64,${image.toString("base64")}` } },
      ]}],
    });
    return res.choices[0].message.content ?? "";
  },
  transcribe: async (audio, mime) => {
    const res = await openai.audio.transcriptions.create({
      model: "gpt-4o-mini-transcribe",
      file: new File([audio], "audio.mp3", { type: mime }),
    });
    return res.text;
  },
});

Mix providers. Claude for vision, OpenAI for audio, whatever:

const markit = new Markit({
  describe: async (image, mime) => {
    const res = await anthropic.messages.create({
      model: "claude-haiku-4-5",
      messages: [{ role: "user", content: [
        { type: "image", source: { type: "base64", media_type: mime, data: image.toString("base64") } },
        { type: "text", text: "Describe this image." },
      ]}],
    });
    return res.content[0].text;
  },
  transcribe: async (audio, mime) => { /* Whisper, Deepgram, AssemblyAI, ... */ },
});

Or use the built-in providers. no SDK needed:

import { Markit, createLlmFunctions, loadConfig } from "markit-ai";

const config = loadConfig(); // reads .markit/config.json + env vars
const markit = new Markit(createLlmFunctions(config));

With plugins:

import { Markit, createLlmFunctions, loadConfig, loadAllPlugins } from "markit-ai";

const config = loadConfig();
const plugins = await loadAllPlugins();
const markit = new Markit(createLlmFunctions(config), plugins);

Configuration

markit init                              # Create .markit/config.json
markit config show                       # Show resolved settings
markit config get llm.model              # Get a value
markit config set llm.provider anthropic # Switch provider
markit config set llm.apiKey sk-...      # Set a value

.markit/config.json:

{
  "llm": {
    "provider": "openai",
    "apiBase": "https://api.openai.com/v1",
    "apiKey": "sk-...",
    "model": "gpt-4.1-nano",
    "transcriptionModel": "gpt-4o-mini-transcribe"
  }
}

Env vars override config. Each provider checks its own env vars first:

Provider	Env vars	Default model
`openai`	`OPENAI_API_KEY`, `MARKIT_API_KEY`	`gpt-4.1-nano`
`anthropic`	`ANTHROPIC_API_KEY`, `MARKIT_API_KEY`	`claude-haiku-4-5`

CLI Reference

markit <source>                          # Convert file or URL
markit <source> -o output.md             # Write to file
markit <source> -p "instructions"        # Custom AI prompt
markit <source> --json                   # JSON output
markit <source> -q                       # Raw markdown only
cat file.pdf | markit -                  # Read from stdin
markit formats                           # List supported formats
markit init                              # Create .markit/ config
markit config show                       # Show settings
markit config get <key>                  # Get config value
markit config set <key> <value>          # Set config value
markit plugin install <source>           # Install plugin
markit plugin list                       # List plugins
markit plugin remove <name>              # Remove plugin
markit onboard                           # Add to CLAUDE.md

Development

bun install
bun run dev -- report.pdf
bun test
bun run check

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
benchmark		benchmark
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markit

Quick Start

Supported Formats

AI Features

Plugins

Install

Write a Plugin

For Agents

SDK

Configuration

CLI Reference

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

markit

Quick Start

Supported Formats

AI Features

Plugins

Install

Write a Plugin

For Agents

SDK

Configuration

CLI Reference

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 1

Languages

Packages