VisionMCP

A standalone MCP server that provides on-device Vision Framework access for PDF and image text extraction. Uses Apple's Vision OCR exclusively -- no cloud services, no API keys, no data leaves your machine.

Built with Swift 6.3, macOS 26, and the MCP Swift SDK.

How it works

Two independent parsers, each producing structured PageExtraction results:

PDF ingestion -- renders PDF pages to images via PDFKit, then runs RecognizeDocumentsRequest (macOS 26 Vision API) for structured document OCR. Extracts text, tables, lists, and paragraphs.
Image ingestion -- loads images via CGImageSource, then runs VNRecognizeTextRequest for text OCR. Supports PNG, JPEG, TIFF, BMP, GIF, HEIC, and WebP.

Both paths produce extracted text, confidence scores, and automatic text chunking with configurable overlap. The server is read-only -- it extracts and returns data with no persistence or database.

Requirements

macOS 26 (Tahoe) or later
Xcode 26 beta or later
Swift 6.3 or later

Build

git clone https://codeberg.org/<your-user>/VisionMCP.git
cd VisionMCP
swift build -c release

The release binary is at .build/release/VisionMCP.

Install

sudo ln -sf $(pwd)/.build/release/VisionMCP /usr/local/bin/visionmcp

Verify:

visionmcp --version

MCP Configuration

opencode

Add to your project's opencode.json:

{
  "mcp": {
    "visionmcp": {
      "type": "local",
      "command": ["/usr/local/bin/visionmcp"],
      "enabled": true
    }
  }
}

Or add to your global ~/.config/opencode/opencode.json to make it available across all projects.

Tools

`ingest_pdf`

Extracts text from a PDF document using Vision OCR. Returns extracted text, chunks, and metadata.

Parameters:

Name	Type	Required	Description
`file_path`	string	yes	Absolute path to the PDF file

Returns:

raw_text -- full extracted text
chunks -- text split into token-limited chunks with overlap
pages -- per-page extraction with text, confidence, tables, lists, paragraphs
file_hash -- SHA-256 hash of the file
page_count, chunk_count, status

`ingest_image`

Extracts text from an image file using Vision OCR. Returns extracted text and metadata.

Parameters:

Name	Type	Required	Description
`file_path`	string	yes	Absolute path to the image file

Supports: PNG, JPEG, TIFF, BMP, GIF, HEIC, WebP. Max file size: 250 MB.

Returns: Same structure as ingest_pdf.

Example response

{
  "file_name": "invoice-001.jpeg",
  "page_count": 1,
  "chunk_count": 2,
  "file_hash": "a258e31c...",
  "raw_text": "Invoice text here...",
  "chunks": "[{\"chunk_index\":0,\"content\":\"...\",\"token_count\":558}]",
  "pages": "[{\"page_number\":1,\"text\":\"...\",\"confidence\":0.97}]",
  "status": "extracted"
}

Architecture

VisionMCP
├── PDFParser              # Renders pages, runs RecognizeDocumentsRequest
├── PDFDocumentActor       # Thread-safe PDFDocument wrapper (Sendable)
├── ImageParser            # Loads images, runs VNRecognizeTextRequest
├── TextChunker            # Splits text into overlapping token-limited chunks
├── IngestService          # Orchestrates parsing + chunking
├── IngestTools            # MCP tool definitions + handlers
├── ToolRegistry           # Wires MCP server to tools
└── main.swift             # Entry point, stdio transport

No shared protocol, no factory, no reconciliation. Each tool routes directly to its parser.

Development

Build

swift build

Test

swift test

Tests use Swift Testing (import Testing, @Test, #expect).

Run locally

swift run VisionMCP

The server communicates over stdio using the MCP protocol.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Sources/VisionMCP		Sources/VisionMCP
Tests/VisionMCPTests		Tests/VisionMCPTests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
demo.gif		demo.gif
opencode.json		opencode.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionMCP

How it works

Requirements

Build

Install

MCP Configuration

opencode

Tools

`ingest_pdf`

`ingest_image`

Example response

Architecture

Development

Build

Test

Run locally

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionMCP

How it works

Requirements

Build

Install

MCP Configuration

opencode

Tools

ingest_pdf

ingest_image

Example response

Architecture

Development

Build

Test

Run locally

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

`ingest_pdf`

`ingest_image`