🚀 DeepSeek OCR - AI-Powered Text Recognition

Complete OCR system using the DeepSeek-OCR model (Released Oct 2025) with modern web interface and production-ready REST API.

⚠️ IMPORTANT: This project is for DEVELOPMENT and TESTING ONLY. Not intended for production use. See LICENSE for details.

✨ Features

🤖 Latest AI Model - DeepSeek-OCR optimized for text recognition
🌐 Modern Web Interface - Intuitive UI with drag-and-drop upload
📊 Progress Tracking - Real-time model download progress bar
🎮 Demo Mode - Test the interface without downloading the model
🔌 Complete REST API - Easy integration with FastAPI
🐳 Docker Compose - Deploy in minutes with one command
⚡ GPU Accelerated - NVIDIA CUDA support for maximum speed
📝 Multiple Modes - Free OCR, Markdown, Grounding, Parse Figure, Detailed
🔓 100% Open Source - MIT License for development/testing

📝 Requirements

Docker 20.10+ and Docker Compose 2.0+
NVIDIA GPU with CUDA 11.8+ (for GPU acceleration)
At least 8GB VRAM (recommended for optimal performance)
10GB disk space (for model cache)
Windows 10/11, Linux, or macOS (with Docker Desktop)

🚀 Quick Start

1. Clone the Repository

git clone https://github.com/YOUR_USERNAME/deepseek-ocr.git
cd deepseek-ocr

2. Configure Environment

cp .env.example .env
# Edit .env if needed (optional)

3. Start Services

docker-compose up -d

4. Access the Application

Web Interface: http://localhost:3000
API Documentation: http://localhost:8000/docs
API Health Check: http://localhost:8000/health

5. Download Model (First Time)

When you first access the web interface, you'll see a button to download the DeepSeek-OCR model. Click it and wait for the download to complete (this may take several minutes depending on your internet connection).

Alternatively, use Demo Mode to test the interface without downloading the model.

API Usage Example

curl -X POST "http://localhost:8000/api/ocr" \
  -F "file=@document.jpg" \
  -F "mode=markdown"

API Documentation: http://localhost:8000/docs

📚 Documentation

Detailed documentation is available in the /docs folder:

QUICKSTART.md - Quick setup guide
USAGE_GUIDE.md - Complete usage manual
WINDOWS_SETUP.md - Windows-specific setup guide

API Documentation

Interactive API documentation is available at http://localhost:8000/docs when the server is running.

🧰 Architecture

deepseek-ocr/
├── 📄 Configuration Files
│   ├── docker-compose.yml     # Docker orchestration
│   ├── .env.example           # Environment template
│   └── .gitignore             # Git ignore rules
│
├── 📖 Documentation
│   ├── README.md              # Main documentation
│   ├── LICENSE                # MIT License (Dev/Test)
│   ├── CONTRIBUTING.md        # Contribution guidelines
│   ├── CODE_OF_CONDUCT.md     # Code of conduct
│   ├── SECURITY.md            # Security policy
│   └── docs/                  # Additional documentation
│
├── 🐍 Backend (FastAPI)
│   ├── main.py                # API endpoints
│   ├── config.py              # Configuration
│   ├── Dockerfile             # Container image
│   └── requirements.txt       # Python dependencies
│
├── 🌐 Frontend (HTML/JS/CSS)
│   ├── index.html             # UI structure
│   ├── app.js                 # Application logic
│   ├── styles.css             # Styling
│   ├── nginx.conf             # Web server config
│   └── Dockerfile             # Container image
│
├── 💾 Data Directories
│   ├── uploads/               # Uploaded images
│   └── outputs/               # OCR results
│
└── 🧪 Testing
    └── test_api.py            # API test script

🔧 Configuration

Environment Variables

Edit docker-compose.yml or create a .env file to customize:

environment:
  - CUDA_VISIBLE_DEVICES=0          # GPU to use
  - MODEL_NAME=deepseek-ai/DeepSeek-OCR
  - MAX_IMAGE_SIZE=1024              # Maximum resolution

📖 API Usage

Image OCR Endpoint

curl -X POST "http://localhost:8000/api/ocr" \
  -F "file=@image.jpg" \
  -F "mode=markdown"

Available Modes

Mode	Description	Recommended Use
`free_ocr`	Fast OCR without structure	General text
`markdown`	Converts to Markdown	Documents
`grounding`	OCR + coordinates	Detailed analysis
`detailed`	Image description	Visual analysis

Response Example

{
  "text": "# Document Title\n\nExtracted content...",
  "mode": "markdown",
  "processing_time": 2.5,
  "image_size": [1024, 768],
  "tokens": 2257
}

🎯 Prompt Examples

# Document
"<image>\n<|grounding|>Convert the document to markdown."

# General image
"<image>\n<|grounding|>OCR this image."

# No format
"<image>\nFree OCR."

# Figures
"<image>\nParse the figure."

# Detailed description
"<image>\nDescribe this image in detail."

🐳 Docker Commands

# Start services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

# Restart
docker-compose restart

# Rebuild images
docker-compose build --no-cache

🔍 Monitoring

Health Check

curl http://localhost:8000/health

API Logs

docker-compose logs -f deepseek-ocr-api

📊 Performance

Benchmark results with 3503×1668 pixels image on NVIDIA A100 40GB:

Mode	Time	Quality	Structure
Free OCR	~24s	⭐⭐⭐	Basic
Markdown	~39s	⭐⭐⭐	Complete
Grounding	~58s	⭐⭐	+ Coords
Detailed	~9s	N/A	Description

Hardware: NVIDIA A100 40GB

🛠️ Supported Resolutions

Tiny: 512×512 (64 tokens)
Small: 640×640 (100 tokens)
Base: 1024×1024 (256 tokens) - Recommended
Large: 1280×1280 (400 tokens)
Dynamic: multiple crops + base

🐛 Troubleshooting

GPU Not Detected

# Verify NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Model Not Downloading

Check internet connection
Verify disk space (need ~7GB free)
Use the download button in the web interface
Check logs: docker-compose logs -f deepseek-ocr-api

Out of Memory

Reduce resolution in backend/config.py:

BASE_SIZE = 640  # instead of 1024

Port Already in Use

Change ports in docker-compose.yml:

ports:
  - "3001:80"  # Frontend (change 3000 to 3001)
  - "8001:8000"  # Backend (change 8000 to 8001)

For more help, check the documentation or open an issue.

📜 Resources

📝 License

MIT License - Development and Testing Only

This software is licensed under the MIT License with specific restrictions for development and testing purposes only. It is NOT intended for production use.

⚠️ Production Use Warning: If you choose to use this software in production, you do so entirely at your own risk and responsibility. The authors provide no guarantees, support, or liability for production deployments.

See the LICENSE file for full terms and conditions.

Third-Party Components

DeepSeek-OCR Model: Subject to its own license terms
Other dependencies: Check individual package licenses in requirements.txt

🤝 Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting PRs.

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'feat: add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please follow our Code of Conduct in all interactions.

🔒 Security

⚠️ This project is for development and testing only.

For security concerns, please review our Security Policy.

Key Security Notes:

No authentication implemented
Not hardened for production use
Use at your own risk in production environments
Report vulnerabilities via GitHub issues with security label

🚀 Getting Help

Documentation: Check the docs folder
Issues: Open an issue on GitHub
Discussions: Use GitHub Discussions for questions
API Docs: Visit http://localhost:8000/docs when running

📌 Project Status

Version: 1.0.0
Status: Active Development
Last Updated: October 2025 Model: DeepSeek-OCR (deepseek-ai)
Purpose: Development and Testing Only

⭐ Show Your Support

If you find this project helpful, please consider:

Giving it a ⭐ on GitHub
Sharing it with others
Contributing improvements
Reporting bugs and suggestions

Made with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
backend		backend
docs		docs
frontend		frontend
outputs		outputs
uploads		uploads
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CHANGES_SUMMARY.md		CHANGES_SUMMARY.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.adoc		CONTRIBUTING.adoc
DOCKER_OPTIMIZATION.md		DOCKER_OPTIMIZATION.md
LICENSE		LICENSE
README.ZH_CN.md		README.ZH_CN.md
README.md		README.md
SECURITY.md		SECURITY.md
build-optimized.ps1		build-optimized.ps1
docker-compose.yml		docker-compose.yml
test_api.py		test_api.py

License

daibitx/deepSeek-ocr-docker-compose

Folders and files

Latest commit

History

Repository files navigation