A complete web application for OCR (Optical Character Recognition) processing using the DeepSeek model through Ollama. This application allows you to upload PDF, PNG, or JPG files and extract their text content using AI-powered OCR.
- Features
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- API Documentation
- Docker Deployment
- Development
- Testing
- Contributing
- License
- 📤 Drag & Drop File Upload - Support for PDF, PNG, and JPG files
- 👁️ File Preview - Preview uploaded images before processing
- 📝 Custom Prompts - Configure OCR instructions with quick templates
- 📁 File Browser - Manage multiple files in queue
- 📊 Results Viewer - View extracted text with fullscreen mode
- 📋 Copy & Download - Export results as TXT or JSON
- 🎨 Modern UI - Clean, responsive design with turquoise accent color
- ⚡ Real-time Progress - Track upload and processing status
- 🔄 Multiple Input Sources - Process files via upload, URL, base64, or server path
- 📄 PDF Support - Automatic PDF to image conversion
- 🔁 Job Queue - Async processing with status polling
- 📚 Swagger API Docs - Interactive API documentation
- 🛡️ Rate Limiting - Built-in request throttling
- ✅ Validation - Request validation with class-validator
- 🏥 Health Checks - Monitor API and Ollama status
- 🧠 DeepSeek OCR Model - Leverages the full potential of DeepSeek's vision-language model for accurate text extraction
- 🌐 Multi-language - Automatic language detection and processing
- 📐 Layout Preservation - Maintains document structure when requested
- 🔧 Precise Prompts - The model requires specific, well-crafted prompts for optimal results
- 📄 PDF to Image - Automatic conversion of PDF pages to images for DeepSeek OCR processing
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Angular) │
│ ┌─────────────┐ ┌────────────────┐ ┌─────────────────┐ │
│ │ File Upload │ │ Prompt Editor │ │ Results Viewer │ │
│ └─────────────┘ └────────────────┘ └─────────────────┘ │
└────────────────────────────┬────────────────────────────────┘
│ HTTP/REST
┌────────────────────────────┴────────────────────────────────┐
│ Backend (NestJS) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ OCR Module │ │ Storage │ │ Ollama Service │ │
│ │ - Upload │ │ - File Mgmt │ │ - API Client │ │
│ │ - Process │ │ - PDF Conv │ │ - Retry Logic │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└────────────────────────────┬────────────────────────────────┘
│ HTTP
┌────────────────────────────┴────────────────────────────────┐
│ Ollama (DeepSeek OCR) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ DeepSeek VL2 Model - Vision-Language Understanding ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
- Node.js 18.x or higher
- npm 9.x or higher
- Ollama with DeepSeek OCR model
Note: PDF conversion is handled natively with
pdf-to-imglibrary - no system dependencies like Poppler required.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama
ollama serve
# Pull DeepSeek OCR model
ollama pull deepseek-ocrgit clone https://github.com/MAMISHO/deepseek-ocr-app.git
cd deepseek-ocr-appcd backend
# Install dependencies
npm install
# Copy environment file
cp .env.example .env
# Start development server
npm run start:devThe backend will be available at http://localhost:3000
cd frontend
# Install dependencies
npm install
# Start development server
npm startThe frontend will be available at http://localhost:4200
Create a .env file in the backend directory:
# Application
APP_NAME=deepseek-ocr
APP_ENV=development
APP_PORT=3000
APP_HOST=0.0.0.0
APP_CORS_ORIGINS=http://localhost:4200,http://localhost:3000
# Ollama Configuration
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=deepseek-ocr
OLLAMA_TIMEOUT=300000
OLLAMA_MAX_RETRIES=3
OLLAMA_RETRY_DELAY=1000
# OCR Configuration
OCR_DEFAULT_LANGUAGE=auto
OCR_DEFAULT_OUTPUT_FORMAT=text
OCR_MAX_PAGES=100
OCR_DEFAULT_PROMPT=Extract all text from this image.
# Storage
STORAGE_TYPE=local
STORAGE_LOCAL_PATH=./uploads
STORAGE_MAX_FILE_SIZE=52428800
STORAGE_ALLOWED_MIMETYPES=application/pdf,image/png,image/jpeg
# Rate Limiting
RATE_LIMIT_ENABLED=true
RATE_LIMIT_TTL=60
RATE_LIMIT_MAX=100
# Logging
LOG_LEVEL=debug
LOG_FORMAT=prettyEdit frontend/src/environments/environment.ts:
export const environment = {
production: false,
apiUrl: 'http://localhost:3000/api',
maxFileSize: 52428800, // 50MB
allowedExtensions: ['pdf', 'png', 'jpg', 'jpeg'],
allowedMimeTypes: ['application/pdf', 'image/png', 'image/jpeg'],
};- Open
http://localhost:4200in your browser - Drag & drop or click to upload a file (PDF, PNG, or JPG)
- Optionally modify the prompt or select a quick template
- Click "Start Analysis" to begin OCR processing
- View results in the Results Viewer panel
- Copy to clipboard or download as TXT/JSON
| Prompt | Description |
|---|---|
| Extract Text | Basic text extraction |
| To Markdown | Convert document to markdown format |
| Parse Figure | Analyze charts and diagrams |
| Free OCR | General purpose OCR |
| Layout Analysis | Preserve document layout |
DeepSeek OCR requires precise prompts for optimal results. Here is a list of prompts that have been tested and work correctly:
| To achieve... | Use prompts like... |
|---|---|
| Simple and reliable text extraction | "Extract all text from this image." "Perform OCR and output the text." |
| Structure a document (clean Markdown) | "Convert the entire document to clean markdown, using appropriate headings and lists. Exclude any non-textual elements or coordinates." |
| Transcribe handwritten text | "Transcribe the handwritten text exactly as it appears." |
| Focus on specific information types | "Extract all text, with a focus on numerical data and dates." "Find and list all names and email addresses in the document." |
| Extract tables | "Extract the table data and format it as a markdown table." |
| Invoice/receipt analysis | "Extract all text from this receipt, including item names, quantities, prices, and total." |
📝 Note: If you discover other prompts that work reliably, please contribute by adding them to this documentation via a Pull Request or Issue.
⚠️ Important: The DeepSeek OCR model is sensitive to prompt precision. Avoid vague or ambiguous prompts for better results.
Access the interactive API documentation at:
http://localhost:3000/api/docs
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/ocr/upload |
Upload file(s) |
POST |
/api/ocr/process |
Process uploaded file by ID |
POST |
/api/ocr/process-url |
Process file from URL |
POST |
/api/ocr/process-base64 |
Process file from base64 |
POST |
/api/ocr/process-path |
Process file from server path |
GET |
/api/ocr/status/:jobId |
Get job status |
GET |
/api/ocr/result/:jobId |
Get job result |
DELETE |
/api/ocr/file/:fileId |
Delete uploaded file |
GET |
/api/health |
Health check |
GET |
/api/config |
Public configuration |
# Upload a file
curl -X POST http://localhost:3000/api/ocr/upload \
-F "file=@document.png"
# Process the file
curl -X POST http://localhost:3000/api/ocr/process \
-H "Content-Type: application/json" \
-d '{"fileId": "uuid-here", "prompt": "<image>\nExtract the text in the image."}'
# Get result
curl http://localhost:3000/api/ocr/result/{jobId}# Production (with GPU support)
docker-compose up -d
# Development (CPU only)
docker-compose -f docker-compose.dev.yml up -d# Build backend
cd backend
docker build -t deepseek-ocr-backend .
# Build frontend
cd frontend
docker build -t deepseek-ocr-frontend .┌─────────────────────────────────────────────────────────┐
│ Docker Network │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Frontend │ │ Backend │ │ Ollama │ │
│ │ (nginx) │──│ (NestJS) │──│ (DeepSeek) │ │
│ │ Port: 80 │ │ Port: 3000 │ │ Port: 11434 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Volumes: │
│ - ollama_data: Model storage │
│ - uploads_data: Uploaded files │
└─────────────────────────────────────────────────────────┘
deepseek-ocr-app/
├── backend/
│ ├── src/
│ │ ├── config/ # Configuration
│ │ ├── modules/
│ │ │ ├── ocr/ # OCR processing
│ │ │ ├── ollama/ # Ollama integration
│ │ │ ├── storage/ # File storage
│ │ │ └── health/ # Health checks
│ │ └── main.ts
│ ├── Dockerfile
│ └── package.json
├── frontend/
│ ├── src/
│ │ ├── app/
│ │ │ ├── core/ # Services, models
│ │ │ ├── features/ # Feature modules
│ │ │ └── shared/ # Shared components
│ │ └── environments/
│ ├── Dockerfile
│ └── package.json
├── docs/
│ └── postman-collection.json
├── docker-compose.yml
└── README.md
# Backend tests
cd backend
npm run test
npm run test:e2e
npm run test:cov
# Frontend tests
cd frontend
npm run test# Backend
cd backend
npm run lint
npm run format
# Frontend
cd frontend
npm run lintImport the Postman collection from docs/postman-collection.json for easy API testing.
The collection includes:
- All API endpoints
- Pre-configured variables
- Example requests
- Complete workflow tests
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- DeepSeek for the OCR model
- Ollama for the local AI runtime
- Angular for the frontend framework
- NestJS for the backend framework
Made with ❤️ by MAMISHO