Native OCR for M1/M2/M3/M4 Macs - A high-performance alternative to DeepSeek-OCR optimized for Apple Silicon.
This is a complete OCR (Optical Character Recognition) solution for Apple Silicon Macs using MLX-VLM, Apple's machine learning framework. Instead of requiring NVIDIA GPUs (CUDA), it runs natively on your M4's Metal GPU.
| Feature | DeepSeek-OCR | MLX-VLM (This Project) |
|---|---|---|
| GPU | NVIDIA only | ✅ Apple M1/M2/M3/M4 |
| Setup | Complex (CUDA, vLLM, flash-attention) | ✅ Ready to use! |
| Performance on M4 | ❌ Won't run | ✅ Native & Fast |
| Memory | High VRAM required | ✅ Optimized for unified memory |
| Quality | Excellent | ✅ Excellent |
cd /Users/w/AI/28_Deepseek-OCR
source initThis activates the environment and sets up helpful commands.
demoFirst run will download a ~1.5GB model (subsequent runs are instant).
mlx-ocr your_image.jpgThat's it! 🎉
- Conda environment:
deepseek-ocr-mlx - MLX 0.29.3: Apple's ML framework
- MLX-VLM 0.3.4: Vision-Language Models
- All dependencies installed
init- Initialize environment with helpful commandsquick_demo.py- Quick test with sample imagestest_mlx_ocr.py- Comprehensive test suite
README.md- This fileQUICK_START.md- Command cheat sheetMLX_VLM_GUIDE.md- Complete documentation
DeepSeek-OCR/assets/- Sample images for testing
After running source init, you have access to these commands:
# Run demo
demo
# Full OCR test
mlx-ocr DeepSeek-OCR/assets/show1.jpg
# Quick OCR extraction
mlx-quick your_image.jpg
# Convert to markdown
mlx-ocr document.png --mode markdown
# Interactive UI
mlx-ui
# Start API server
mlx-server# Basic OCR
mlx_vlm.generate \
--model mlx-community/Qwen2-VL-2B-Instruct-4bit \
--image your_image.jpg \
--prompt "Extract all text"
# Document to Markdown
mlx_vlm.generate \
--model mlx-community/Qwen2-VL-2B-Instruct-4bit \
--image document.png \
--prompt "Convert to markdown with proper formatting"from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load model
model, processor = load("mlx-community/Qwen2-VL-2B-Instruct-4bit")
config = load_config("mlx-community/Qwen2-VL-2B-Instruct-4bit")
# OCR
image = ["your_image.jpg"]
prompt = "Extract all text from this image."
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
output = generate(
model, processor, formatted_prompt, image,
max_tokens=1000, temperature=0.3
)
print(output)Models are automatically downloaded on first use and cached locally.
mlx-community/Qwen2-VL-2B-Instruct-4bit
- Size: ~1.5GB
- Speed: Very Fast ⚡
- Quality: Good ✅
- Best for: Quick tests, real-time OCR
mlx-community/Qwen2-VL-7B-Instruct-4bit
- Size: ~4GB
- Speed: Fast ⚡
- Quality: Excellent ✨
- Best for: Production use, complex documents
mlx-community/Qwen2.5-VL-7B-Instruct-4bit
- Size: ~4GB
- Speed: Fast ⚡
- Quality: State-of-the-art 🏆
- Best for: Maximum accuracy, research
mlx-ocr image.jpg --model mlx-community/Qwen2-VL-7B-Instruct-4bit- Text extraction from images
- Document scanning
- Handwriting recognition
- Table extraction
- Mathematical equations (LaTeX)
- Multi-language support
- Plain text
- Markdown
- JSON (structured data)
- LaTeX (equations)
- Custom formats via prompts
- Interactive web UI (Gradio)
- REST API server (FastAPI)
- Batch processing
- Video understanding
- Audio transcription (with audio models)
28_Deepseek-OCR/
├── init # 🚀 Start here! Environment setup
├── README.md # This file
├── QUICK_START.md # Command reference
├── MLX_VLM_GUIDE.md # Complete documentation
│
├── quick_demo.py # Quick demo script
├── test_mlx_ocr.py # Full test suite
│
└── DeepSeek-OCR/ # Original repo (reference)
└── assets/ # Sample images
├── show1.jpg
├── show2.jpg
├── show3.jpg
└── show4.jpg
# Use HuggingFace mirror for faster downloads (China)
export HF_ENDPOINT=https://hf-mirror.com
# Cache directory for models
export HF_HOME=~/.cache/huggingface# Temperature (OCR accuracy)
temperature = 0.3 # More deterministic (recommended for OCR)
temperature = 0.7 # Balanced
temperature = 1.0 # More creative
# Max tokens
max_tokens = 500 # Short text
max_tokens = 1000 # Medium documents
max_tokens = 2000 # Long documentssource init
mlx-uiOpens a web interface at http://localhost:7860 where you can:
- Upload images via drag & drop
- Chat with the vision model
- Download results
- Adjust parameters
source init
mlx-serverThen use with curl:
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
"image": ["/path/to/image.jpg"],
"prompt": "Extract all text",
"max_tokens": 1000
}'import os
from pathlib import Path
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load model once
model, processor = load("mlx-community/Qwen2-VL-2B-Instruct-4bit")
config = load_config("mlx-community/Qwen2-VL-2B-Instruct-4bit")
# Process multiple images
image_dir = Path("path/to/images")
for image_path in image_dir.glob("*.jpg"):
image = [str(image_path)]
prompt = "Extract all text"
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=len(image)
)
output = generate(model, processor, formatted_prompt, image,
max_tokens=1000, temperature=0.3, verbose=False)
# Save results
output_path = image_path.with_suffix('.txt')
output_path.write_text(output)
print(f"✅ {image_path.name} -> {output_path.name}")- Use clear, high-resolution images
- Set temperature to 0.3 for accuracy
- Use 7B model for complex documents
- Be specific in prompts: "Extract text in reading order" vs "Extract text"
- Include structure in prompt: "Convert to markdown with proper headings"
- Use temperature 0.3-0.5
- Increase max_tokens for long documents
- Prompt: "Extract table data as markdown table"
- Alternative: "Extract table as JSON"
- Use 7B model for complex tables
- Prompt: "Extract mathematical equations in LaTeX format"
- Use temperature 0.3
- 7B model recommended
On Apple M4 Mac:
| Model | Size | Speed | Quality | RAM Usage |
|---|---|---|---|---|
| 2B-4bit | 1.5GB | ⚡⚡⚡ Very Fast | Good | ~3GB |
| 7B-4bit | 4GB | ⚡⚡ Fast | Excellent | ~6GB |
| 12B-4bit | 7GB | ⚡ Medium | Best | ~10GB |
Note: First run downloads the model. Subsequent runs use cached model (instant startup).
- ✅ High-quality OCR
- ✅ Document understanding
- ✅ Markdown/JSON export
- ✅ Multi-modal capabilities
- ✅ Vision-language models
- ✅ Works on M4 (DeepSeek-OCR requires NVIDIA)
- ✅ Simple setup (already done!)
- ✅ Lower memory usage (4-bit quantization)
- ✅ Native performance (Metal GPU)
- ✅ Unified memory (efficient on Apple Silicon)
- Higher token compression (fewer vision tokens)
- Specifically trained for OCR tasks
- Larger base models available
For M4 Macs, MLX-VLM is the only practical option and provides excellent results!
# Use mirror (if in China)
export HF_ENDPOINT=https://hf-mirror.com
source init# Use smaller model
mlx-ocr image.jpg --model mlx-community/Qwen2-VL-2B-Instruct-4bit# Use larger model
mlx-ocr image.jpg --model mlx-community/Qwen2-VL-7B-Instruct-4bit# Recreate environment
conda deactivate
conda remove -n deepseek-ocr-mlx --all
conda create -n deepseek-ocr-mlx python=3.12 -y
conda activate deepseek-ocr-mlx
pip install mlx mlx-vlm- Quick Start:
QUICK_START.md- Command cheat sheet - Full Guide:
MLX_VLM_GUIDE.md- Complete documentation - MLX Docs: https://ml-explore.github.io/mlx/
- MLX-VLM GitHub: https://github.com/Blaizzy/mlx-vlm
- DeepSeek-OCR: https://github.com/deepseek-ai/DeepSeek-OCR
- MLX is Apple's ML framework (like PyTorch for Apple Silicon)
- Uses Metal for GPU acceleration
- Optimized for unified memory architecture
- Lazy evaluation for efficiency
- Vision-Language Models combine vision + text understanding
- Can "see" images and generate text descriptions
- OCR is just one application
- Can also: describe images, answer questions, extract data
This is a local setup, but feel free to:
- Test different models
- Create custom prompts
- Share results
- Report issues to MLX-VLM upstream
- MLX: Apache 2.0
- MLX-VLM: MIT
- DeepSeek-OCR: Check original repo
- This setup: Free to use
- Try the demo:
source init && demo - Test your images:
mlx-ocr your_image.jpg - Explore models: Try 7B for better quality
- Read the guide:
MLX_VLM_GUIDE.md - Build something cool: Integrate into your workflow!
- MLX Issues: https://github.com/ml-explore/mlx/issues
- MLX-VLM Issues: https://github.com/Blaizzy/mlx-vlm/issues
- Check docs:
MLX_VLM_GUIDE.md
🚀 Ready to start? Run: source init 🚀
Built with ❤️ for Apple Silicon