Complete OCR system using the DeepSeek-OCR model (Released Oct 2025) with modern web interface and production-ready REST API.
- 🤖 Latest AI Model - DeepSeek-OCR optimized for text recognition
- 🌐 Modern Web Interface - Intuitive UI with drag-and-drop upload
- 📊 Progress Tracking - Real-time model download progress bar
- 🎮 Demo Mode - Test the interface without downloading the model
- 🔌 Complete REST API - Easy integration with FastAPI
- 🐳 Docker Compose - Deploy in minutes with one command
- ⚡ GPU Accelerated - NVIDIA CUDA support for maximum speed
- 📝 Multiple Modes - Free OCR, Markdown, Grounding, Parse Figure, Detailed
- 🔓 100% Open Source - MIT License for development/testing
- Features
- Quick Start
- Requirements
- Installation
- Usage
- Documentation
- Configuration
- Performance
- Troubleshooting
- Contributing
- Security
- License
- Resources
- Docker 20.10+ and Docker Compose 2.0+
- NVIDIA GPU with CUDA 11.8+ (for GPU acceleration)
- At least 8GB VRAM (recommended for optimal performance)
- 10GB disk space (for model cache)
- Windows 10/11, Linux, or macOS (with Docker Desktop)
git clone https://github.com/YOUR_USERNAME/deepseek-ocr.git
cd deepseek-ocrcp .env.example .env
# Edit .env if needed (optional)docker-compose up -d- Web Interface: http://localhost:3000
- API Documentation: http://localhost:8000/docs
- API Health Check: http://localhost:8000/health
When you first access the web interface, you'll see a button to download the DeepSeek-OCR model. Click it and wait for the download to complete (this may take several minutes depending on your internet connection).
Alternatively, use Demo Mode to test the interface without downloading the model.
curl -X POST "http://localhost:8000/api/ocr" \
-F "file=@document.jpg" \
-F "mode=markdown"
API Documentation: http://localhost:8000/docs
Detailed documentation is available in the /docs folder:
- QUICKSTART.md - Quick setup guide
- USAGE_GUIDE.md - Complete usage manual
- WINDOWS_SETUP.md - Windows-specific setup guide
Interactive API documentation is available at http://localhost:8000/docs when the server is running.
deepseek-ocr/
├── 📄 Configuration Files
│ ├── docker-compose.yml # Docker orchestration
│ ├── .env.example # Environment template
│ └── .gitignore # Git ignore rules
│
├── 📖 Documentation
│ ├── README.md # Main documentation
│ ├── LICENSE # MIT License (Dev/Test)
│ ├── CONTRIBUTING.md # Contribution guidelines
│ ├── CODE_OF_CONDUCT.md # Code of conduct
│ ├── SECURITY.md # Security policy
│ └── docs/ # Additional documentation
│
├── 🐍 Backend (FastAPI)
│ ├── main.py # API endpoints
│ ├── config.py # Configuration
│ ├── Dockerfile # Container image
│ └── requirements.txt # Python dependencies
│
├── 🌐 Frontend (HTML/JS/CSS)
│ ├── index.html # UI structure
│ ├── app.js # Application logic
│ ├── styles.css # Styling
│ ├── nginx.conf # Web server config
│ └── Dockerfile # Container image
│
├── 💾 Data Directories
│ ├── uploads/ # Uploaded images
│ └── outputs/ # OCR results
│
└── 🧪 Testing
└── test_api.py # API test script
Edit docker-compose.yml or create a .env file to customize:
environment:
- CUDA_VISIBLE_DEVICES=0 # GPU to use
- MODEL_NAME=deepseek-ai/DeepSeek-OCR
- MAX_IMAGE_SIZE=1024 # Maximum resolution
curl -X POST "http://localhost:8000/api/ocr" \
-F "file=@image.jpg" \
-F "mode=markdown"
| Mode | Description | Recommended Use |
|---|---|---|
free_ocr |
Fast OCR without structure | General text |
markdown |
Converts to Markdown | Documents |
grounding |
OCR + coordinates | Detailed analysis |
detailed |
Image description | Visual analysis |
{
"text": "# Document Title\n\nExtracted content...",
"mode": "markdown",
"processing_time": 2.5,
"image_size": [1024, 768],
"tokens": 2257
}
# Document
"<image>\n<|grounding|>Convert the document to markdown."
# General image
"<image>\n<|grounding|>OCR this image."
# No format
"<image>\nFree OCR."
# Figures
"<image>\nParse the figure."
# Detailed description
"<image>\nDescribe this image in detail."
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
# Restart
docker-compose restart
# Rebuild images
docker-compose build --no-cache
curl http://localhost:8000/health
docker-compose logs -f deepseek-ocr-api
Benchmark results with 3503×1668 pixels image on NVIDIA A100 40GB:
| Mode | Time | Quality | Structure |
|---|---|---|---|
| Free OCR | ~24s | ⭐⭐⭐ | Basic |
| Markdown | ~39s | ⭐⭐⭐ | Complete |
| Grounding | ~58s | ⭐⭐ | + Coords |
| Detailed | ~9s | N/A | Description |
Hardware: NVIDIA A100 40GB
- Tiny: 512×512 (64 tokens)
- Small: 640×640 (100 tokens)
- Base: 1024×1024 (256 tokens) - Recommended
- Large: 1280×1280 (400 tokens)
- Dynamic: multiple crops + base
# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi- Check internet connection
- Verify disk space (need ~7GB free)
- Use the download button in the web interface
- Check logs:
docker-compose logs -f deepseek-ocr-api
Reduce resolution in backend/config.py:
BASE_SIZE = 640 # instead of 1024Change ports in docker-compose.yml:
ports:
- "3001:80" # Frontend (change 3000 to 3001)
- "8001:8000" # Backend (change 8000 to 8001)For more help, check the documentation or open an issue.
- DeepSeek-OCR Official Repository
- Model on HuggingFace
- DeepSeek Research Paper
- FastAPI Documentation
- Docker Documentation
MIT License - Development and Testing Only
This software is licensed under the MIT License with specific restrictions for development and testing purposes only. It is NOT intended for production use.
See the LICENSE file for full terms and conditions.
- DeepSeek-OCR Model: Subject to its own license terms
- Other dependencies: Check individual package licenses in
requirements.txt
Contributions are welcome! Please read our Contributing Guidelines before submitting PRs.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please follow our Code of Conduct in all interactions.
For security concerns, please review our Security Policy.
Key Security Notes:
- No authentication implemented
- Not hardened for production use
- Use at your own risk in production environments
- Report vulnerabilities via GitHub issues with
securitylabel
- Documentation: Check the docs folder
- Issues: Open an issue on GitHub
- Discussions: Use GitHub Discussions for questions
- API Docs: Visit http://localhost:8000/docs when running
Version: 1.0.0
Status: Active Development
Last Updated: October 2025
Model: DeepSeek-OCR (deepseek-ai)
Purpose: Development and Testing Only
If you find this project helpful, please consider:
- Giving it a ⭐ on GitHub
- Sharing it with others
- Contributing improvements
- Reporting bugs and suggestions
Made with ❤️ for the AI community