Docuglean is a unified SDK for intelligent document processing using State of the Art AI models. Docuglean provides multilingual and multimodal capabilities with plug-and-play APIs for document OCR, structured data extraction, annotation, classification, summarization, and translation. It also comes with inbuilt tools and supports different types of documents out of the box.
- 🚀 Easy to Use: Simple, intuitive API with detailed documentation. Just pass in a file and get markdown in response.
- 🔍 OCR Capabilities: Extract text from images and scanned documents
- 📊 Structured Data Extraction: Use Zod/Pydantic schemas for type-safe structured data extraction
- 📄 Multimodal Support: Process PDFs and images with ease
- 🤖 Multiple AI Providers: Support for OpenAI, Mistral, and Google Gemini, with more coming soon
- 🔒 Type Safety: Full TypeScript support with comprehensive types
- summarize: Get structured TLDRs of long documents
- local OCR (PDF): Parse PDFs locally without calling external APIs.
Package: docuglean-ocr
npm install docuglean-ocrRepository: node-ocr/
Quick Start:
OCR Function - Pure OCR Processing Extracts text from documents and images, returning content and metadata like bounding boxes (provider-dependent).
import { ocr, extract } from 'docuglean-ocr';
// Extract raw text from documents (supports URLs and local files)
const ocrResult = await ocr({
filePath: 'https://arxiv.org/pdf/2302.12854',
provider: 'openai',
model: 'gpt-4o-mini',
apiKey: 'your-api-key'
});Extract Function - Structured Data Extraction Extracts structured data from documents using custom schemas. Also handles summarization via custom prompts and a compact schema.
import { z } from 'zod';
// Define schema for structured extraction
const ReceiptSchema = z.object({
date: z.string(),
total: z.number(),
items: z.array(z.object({
name: z.string(),
price: z.number()
}))
});
// Extract structured data from documents
const extractResult = await extract({
filePath: './receipt.pdf',
provider: 'mistral',
model: 'mistral-small-latest',
apiKey: 'your-api-key',
responseFormat: ReceiptSchema,
prompt: 'Extract receipt details including date, total, and items'
});
// Summarization via extract
const SummarySchema = z.object({
title: z.string().optional(),
summary: z.string().min(50),
keyPoints: z.array(z.string()).min(3).max(7),
});
const summary = await extract({
filePath: './long-report.pdf',
provider: 'openai',
apiKey: 'your-api-key',
responseFormat: SummarySchema,
prompt: 'Provide a concise 3-sentence summary of this document and 3–7 key points.'
});
console.log('Summary:', summary.summary);Note: you can also use extract with a targeted "search" prompt (e.g., "Find all occurrences of X and return matching passages") to perform semantic search within a document.
Package: docuglean-ocr
pip install docuglean-ocrRepository: python-ocr/
Quick Start:
OCR Function - Pure OCR Processing Extracts text from documents and images, returning content and metadata like bounding boxes (provider-dependent).
from docuglean import ocr, extract
# Extract raw text from documents (supports URLs and local files)
ocr_result = await ocr(
file_path="./test/data/testocr.png",
provider="gemini",
model="gemini-2.5-flash",
api_key="your-api-key"
)Extract Function - Structured Data Extraction Extracts structured data from documents using custom schemas. Requires a response format schema and returns parsed data.
from pydantic import BaseModel
from typing import List
# Define schema for structured extraction
class Item(BaseModel):
name: str
price: float
class Receipt(BaseModel):
date: str
total: float
items: List[Item]
# Extract structured data from documents
extract_result = await extract(
file_path="./receipt.pdf",
provider="mistral",
model="mistral-small-latest",
api_key="your-api-key",
response_format=Receipt,
prompt="Extract receipt details including date, total, and items"
)- 🏷️ classify(): Document type classifier (receipt, ID, invoice, etc.)
- 🤖 More Models. More Providers: Integration with Meta's Llama, Together AI, OpenRouter and lots more.
- 🌍 Multilingual: Support for multiple languages
- 🎯 Smart Classification: Automatic document type detection
Currently supported providers and models:
- OpenAI:
gpt-4o-mini,gpt-4o,gpt-4-turbo,gpt-3.5-turbo,o1-mini,o1-preview - Mistral:
mistral-ocr-latest,mistral-small-latest,ministral-8b-latest - Google Gemini:
gemini-2.5-flash,gemini-2.5-pro,gemini-1.5-flash,gemini-1.5-pro - Hugging Face:
Qwen/Qwen2.5-VL-3B-Instructand other vision-language models (Python only)
cd node-ocr
npm install
npm run build
npm testcd python-ocr
uv sync
uv run pytestWe welcome contributions! Please see our Contributing Guide for details.
Apache 2.0 - see the LICENSE file for details.
⭐ Star this repo to get notified about new releases and updates!
