Skip to content

docdyhr/drop2md

Repository files navigation

drop2md

CI Release Python 3.11+ License: MIT

drop2md is a macOS document-to-markdown converter that watches a drop folder and automatically converts documents to clean GFM markdown — optimized for Claude Desktop, Claude Code, Cowork, and any LLM-based workflow.

Drop a PDF, Word doc, PowerPoint, spreadsheet, or web page into the watch folder and get a clean .md file with preserved tables, extracted images, and YAML frontmatter in seconds.

Features

  • Multi-format support: PDF, DOCX, PPTX, XLSX, HTML, EPUB, PNG/JPG
  • Best-in-class converters: Marker (PDF), MarkItDown (Office), html2text (HTML)
  • Tiered PDF conversion: Marker → Docling → PyMuPDF4LLM → pdfplumber fallback
  • Table preservation: GFM pipe tables, maintained reading order
  • Image extraction: Embedded images saved separately, referenced with relative paths
  • YAML frontmatter: source, date, converter, page count, quality score — great for LLM context
  • macOS background service: launchd integration, survives restarts
  • Optional Ollama: local LLM for image captions and table validation
  • Apple Silicon: Marker uses Apple MPS for fast inference on M-series Macs

Quick Start

Homebrew (recommended — Apple Silicon, macOS 14+)

brew tap docdyhr/tap
brew install drop2md

# Interactive first-run wizard — sets up config, service, and Quick Action
drop2md setup

pip install (Python 3.11+, Intel or Apple Silicon)

# Install system dependencies
brew install pandoc tesseract

# Install drop2md (core + office support)
pip install 'drop2md[pdf-light,office,ocr]'

# Or install with best PDF quality (~2 GB, PyTorch + Marker)
pip install 'drop2md[pdf-ml,office,ocr]'

# Configure
cp config.toml.example config.toml
# Edit config.toml: set watch_dir and output_dir

First run

# Start watching (foreground)
drop2md watch

# Or install as macOS background service
drop2md install-service

One-Shot Conversion

drop2md convert report.pdf
drop2md convert presentation.pptx --output ~/Desktop/
drop2md convert *.docx --output ~/Documents/markdown/

Configuration

Copy config.toml.example to config.toml and edit:

[paths]
watch_dir  = "~/Documents/drop-to-md"
output_dir = "~/Documents/markdown-output"

[pdf]
use_marker  = true
marker_device = "mps"   # Apple Silicon

[ollama]
enabled  = false          # set true to enable AI enhancement
provider = "ollama"       # "ollama" | "claude" | "openai" | "hf"
model    = "qwen3.5:latest"

See config.toml.example for all options.

macOS Service

# Install and start background service
drop2md install-service

# Snapshot: service state, config, recent conversions, process resources
drop2md status

# Live monitor — refreshes every 2 seconds (Ctrl-C to quit)
drop2md status --watch
drop2md status --watch --interval 5

# Remove service
drop2md uninstall-service

# Logs
tail -f ~/Library/Logs/drop2md/drop2md.log

The status command includes a Process Resources table showing CPU%, memory, file descriptors, and uptime for every running drop2md process (watcher, MCP server, active converters).

Finder Quick Action

Right-click any supported file in Finder and choose Quick Actions → drop2mark to convert it to Markdown in-place (output saved next to the original):

# Install the Quick Action
drop2md install-quick-action

# Remove it
drop2md uninstall-quick-action

After installation the action may need to be enabled in System Settings → Privacy & Security → Extensions → Finder. A macOS notification confirms conversion results.

Claude Desktop Integration

Claude Desktop can read PDFs directly with full vision support (tables, charts, images all understood). Converted markdown is still preferred because:

  • Works with Claude Code (CLI) which cannot display PDFs
  • Reduces token usage — pre-extracted text costs less than PDF vision processing
  • Works universally across all LLM tools (Obsidian, Cowork, any RAG pipeline)
  • Extracted images are stored as separate files Claude can view on demand

Output Format

Each converted file includes a YAML frontmatter block:

---
source: "report.pdf"
converted: "2026-04-04T14:23:01"
converter: "marker"
pages: 12
quality: high
---

# Report Title

## Section 1

| Column A | Column B |
|---|---|
| value 1 | value 2 |

![Figure 1](./images/report_1_0.png)

Supported Formats

Format Converter Notes
PDF Marker / Docling / pdfplumber Tiered fallback; scanned PDFs detected automatically
DOCX MarkItDown / Pandoc Tables, headings preserved
PPTX MarkItDown Slide text + notes
XLSX MarkItDown Multiple sheets as tables
HTML html2text / Pandoc Links preserved
EPUB Pandoc Chapter structure
PNG/JPG pytesseract + AI caption OCR + optional AI description
RTF Pandoc Via OfficeConverter Pandoc fallback
ODT / ODP / ODS Pandoc OpenDocument formats

Optional AI Enhancement

When [ollama] enabled = true, drop2md runs an optional post-processing pass:

  • Visual Enhancement Pipeline (VEP): Extracted images are classified as chart | diagram | formula | table-image | screenshot | photo and enriched with prose descriptions, Mermaid blocks, LaTeX math, or reconstructed GFM tables
  • Image captions: Images without a VEP handler get a one-sentence AI-generated alt-text
  • Table validation: Broken GFM tables are auto-corrected

A vision-capable model is required for VEP. The default llava-llama3:8b works out of the box with Ollama. Text-only models (e.g. llama3, qwen3.5) silently skip image classification.

Four providers are supported:

Provider Model API key env var Notes
ollama llava-llama3:8b (default) Free, local, private
claude claude-haiku-4-5-20251001 ANTHROPIC_API_KEY Best output quality
openai gpt-4o-mini OPENAI_API_KEY Best speed and cost
gemini gemini-2.5-flash GEMINI_API_KEY Free tier available
[ollama]
enabled  = true
provider = "ollama"          # "ollama" | "claude" | "openai" | "gemini" | "hf"
model    = "llava-llama3:8b" # ollama pull llava-llama3:8b

[visual]
enabled = true               # enable the Visual Enhancement Pipeline

API keys are resolved in order: api_key field in config → DROP2MD_ENHANCE_API_KEY env var → provider-native env var.

All providers fall back gracefully — a missing key or offline service never blocks conversion.

Development

See CLAUDE.md for the full developer guide.

Document Contents
docs/ROADMAP.md Versioned milestones, competitive position, architecture decisions
docs/PRD.md Product requirements, personas, feature checklist
docs/testing.md Full testing reference
docs/mcp_integration.md Claude Desktop MCP setup
pip install -e ".[dev,test]"
pytest
ruff check src/ tests/

License

MIT — see LICENSE

About

macOS document-to-markdown converter with folder watching and LLM integration

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors