Convert any document to AI-ready Markdown in seconds Cloud-hosted Model Context Protocol server powered by Microsoft's Markitdown
Markitdown MCP Server is a cloud-hosted service that converts documents into clean, AI-optimized Markdown. Built on Microsoft's Markitdown library (82k+ β), it eliminates the need for local Python installations and provides instant, scalable document conversion through the Model Context Protocol.
Perfect for RAG pipelines, knowledge bases, AI agents, and document processing workflows.
Convert 29+ file formats to clean Markdown:
- Documents: PDF, DOCX, PPTX, XLSX
- Images: PNG, JPG, GIF (with OCR)
- Web: HTML, XML
- Audio: MP3, WAV (with transcription)
- Archives: ZIP (extract and convert contents)
- And many more!
- No Python installation needed
- No dependency management
- No local configuration
- Just call the API and get Markdown
- First-class Model Context Protocol support
- Works seamlessly with Claude Desktop, Cursor, Aider
- AI agents can discover and use it automatically
- Direct Python library integration (no subprocess overhead)
- Typical conversion: < 3 seconds
- Cloud-scale infrastructure via Apify
- $0.01 per Actor start
- $0.02 per document conversion
- No subscriptions, no minimums
- Add to MCP Configuration
Create or edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"markitdown": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://rector-labs--markitdown-mcp-server.apify.actor/mcp",
"--header",
"Authorization: Bearer YOUR_APIFY_TOKEN"
]
}
}
}-
Restart Claude Desktop
-
Convert Documents
Simply ask Claude:
"Convert this PDF to markdown: https://example.com/document.pdf"
Claude will automatically use the Markitdown tool!
curl -X POST https://api.apify.com/v2/acts/rector_labs~markitdown-mcp-server/runs \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"fileUrl": "https://example.com/document.pdf"
}'from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('rector_labs/markitdown-mcp-server').call(
run_input={
'fileUrl': 'https://example.com/document.pdf'
}
)
# Get markdown output
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item['markdown'])import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('rector_labs/markitdown-mcp-server').call({
fileUrl: 'https://example.com/document.pdf'
});
// Get markdown output
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);| Format | Extension | Notes |
|---|---|---|
.pdf |
Text extraction, OCR support | |
| Word | .docx, .doc |
Preserves formatting |
| PowerPoint | .pptx, .ppt |
Slide text extraction |
| Excel | .xlsx, .xls |
Table to Markdown |
| CSV | .csv |
Table formatting |
| TSV | .tsv |
Table formatting |
| Format | Extension | Notes |
|---|---|---|
| PNG | .png |
OCR text extraction |
| JPEG | .jpg, .jpeg |
OCR text extraction |
| GIF | .gif |
OCR text extraction |
| BMP | .bmp |
OCR text extraction |
| Format | Extension | Notes |
|---|---|---|
| HTML | .html, .htm |
Clean conversion |
| XML | .xml |
Structured data |
| Markdown | .md |
Pass-through |
| Format | Extension | Notes |
|---|---|---|
| MP3 | .mp3 |
Speech-to-text transcription |
| WAV | .wav |
Speech-to-text transcription |
| YouTube | URLs | Transcript extraction |
| Format | Extension | Notes |
|---|---|---|
| ZIP | .zip |
Extract and convert contents |
PDF Documents β Markitdown β Clean Markdown β Vector DB β LLM
Perfect for preparing documents for semantic search and retrieval.
Convert legacy documentation (PDFs, Word docs) to modern Markdown format for wikis, documentation sites, or content management systems.
Extract text from research papers, presentations, and datasets for analysis and processing.
Convert invoices, reports, and spreadsheets into structured Markdown for further processing.
Process hundreds of documents in parallel using Apify's infrastructure.
{
"mcpServers": {
"markitdown": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://rector-labs--markitdown-mcp-server.apify.actor/mcp",
"--header",
"Authorization: Bearer YOUR_APIFY_TOKEN"
]
}
}
}- Add Apify node
- Select Markitdown MCP Server actor
- Configure file URL input
- Connect to downstream nodes
- Add Apify module
- Select actor:
rector_labs/markitdown-mcp-server - Map file URL from trigger
- Use output in next steps
- Choose Apify app
- Action: Run Actor
- Actor:
markitdown-mcp-server - Map data from previous steps
| Parameter | Type | Required | Description |
|---|---|---|---|
fileUrl |
string | β (or base64) | URL of the document to convert |
fileBase64 |
string | β (or URL) | Base64-encoded file content |
Note: Provide either fileUrl or fileBase64, not both.
URL-based:
{
"fileUrl": "https://example.com/document.pdf"
}Base64-based:
{
"fileBase64": "JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC..."
}The actor outputs clean Markdown text with metadata:
{
"event": "conversion_success",
"file_size": 153600,
"markdown_length": 5234,
"file_type": ".pdf"
}The Markdown content is returned as the tool response.
| Event | Price | Description |
|---|---|---|
| Actor Start | $0.01 | One-time fee per Actor run |
| Document Conversion | $0.02 | Per successful conversion |
- Single document: $0.03 total ($0.01 start + $0.02 conversion)
- 100 documents: ~$2.10 ($0.01 start + $2.00 conversions)
- 1,000 documents: ~$20.10 ($0.01 start + $20.00 conversions)
No subscriptions. No minimums. Pay only for what you use.
| Metric | Value |
|---|---|
| Average conversion time | < 3 seconds |
| Small files (< 1MB) | < 2 seconds |
| Large files (10MB+) | < 10 seconds |
| Concurrent processing | Unlimited (cloud-scaled) |
| Uptime | 99.95% (Apify SLA) |
The actor gracefully handles:
- Invalid file URLs (404, network errors)
- Unsupported file formats (clear error messages)
- Corrupted files (validation before processing)
- Large files (automatic timeout handling)
All conversions are logged with:
- File type and size
- Conversion duration
- Success/failure status
- Error details (if any)
Coming soon:
- Azure Document Intelligence integration
- OpenAI image description
- Custom OCR settings
- Batch processing mode
- No data retention: Files are processed and immediately deleted
- Encrypted transport: All transfers use HTTPS
- Isolated execution: Each conversion runs in a sandboxed container
- No logging of content: Only metadata is logged
- GDPR compliant: Hosted on Apify's secure infrastructure
A: This is a cloud-hosted service with:
- β No Python installation required
- β No dependency management
- β Automatic scaling for batch processing
- β MCP integration for AI agents
- β 99.95% uptime guarantee
- β Pay-per-use (no server costs)
A: Not currently. Password-protected documents will return an error. Remove protection before conversion.
A: 100 MB hard limit. Files over 50 MB may take longer to process. For larger files, consider splitting them first.
A: Yes! OCR (Optical Character Recognition) is supported for image-based PDFs and image files.
A: Absolutely! The actor runs on Apify's production infrastructure with 99.95% uptime SLA.
A: Markitdown preserves:
- β Headings and structure
- β Bold and italic formatting
- β Lists (ordered and unordered)
- β Tables
- β Links
- β Code blocks
Complex layouts may need manual review.
A: Yes! Run multiple Actor instances in parallel, or use batch mode (contact for enterprise pricing).
Cause: The URL is invalid or the file doesn't exist.
Solution:
- Verify the URL is correct and publicly accessible
- Ensure the file hasn't been deleted or moved
- Check for authentication requirements
Cause: The file extension is not in the supported formats list.
Solution:
- Check the Supported Formats section
- Convert the file to a supported format first
- Contact support if you need a specific format added
Cause: The file is too large or complex.
Solution:
- Split large files into smaller chunks
- Simplify complex documents
- Increase timeout (contact support for enterprise plans)
Cause: The base64 string is malformed or incomplete.
Solution:
- Verify base64 encoding is correct
- Ensure no truncation occurred during transfer
- Use
fileUrlinstead if possible
- MCP Protocol: modelcontextprotocol.io
- Microsoft Markitdown: github.com/microsoft/markitdown
- Apify Platform: docs.apify.com
- Python SDK: docs.apify.com/sdk/python
- π§ Email: support@apify.com
- π¬ Discord: apify.com/discord
- π Documentation: docs.apify.com
- π Bug Reports: GitHub Issues
- β Star on GitHub: RECTOR-LABS/markitdown-mcp-server
- π¦ Follow Updates: @apify
- π‘ Feature Requests: Open a GitHub issue
- Log in to Apify
apify login- Deploy the Actor
apify push- Enable Standby Mode
Go to Actor settings and enable standby mode.
- Get Your Actor URL
Your MCP endpoint will be: https://rector-labs--markitdown-mcp-server.apify.actor/mcp
- Connect AI Agents
Add the endpoint to Claude Desktop, Cursor, or your favorite MCP client!
This project is built on:
- Microsoft Markitdown: MIT License
- Apify SDK: Apache 2.0 License
- MCP SDK: MIT License
Actor code: MIT License
Built with:
- Microsoft Markitdown - Document conversion library (82k+ β)
- Apify Platform - Serverless cloud infrastructure
- MCP Protocol - AI agent integration standard
Made with β€οΈ for the AI developer community