Skip to content

Selenium39/LLMOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMOCR

AI-powered OCR service for converting PDFs and images to Markdown with 99.9% accuracy.

English | 简体中文

License: MIT Next.js TypeScript

Overview

LLMOCR is a comprehensive AI-powered OCR (Optical Character Recognition) service that provides high-accuracy text extraction and document conversion capabilities. Built with Next.js and powered by advanced AI models, it offers multiple OCR features including PDF to Markdown conversion, multilingual text recognition, formula recognition, and key information extraction.

Features

  • PDF to Markdown: Convert PDF documents to Markdown format with high accuracy
  • Image to Markdown: Extract text from images and convert to Markdown
  • Multilingual Text Recognition: Support for multiple languages with advanced recognition
  • Text Recognition: General-purpose text extraction from images
  • Key Information Extraction: Extract structured data from documents
  • Formula Recognition: Recognize mathematical formulas and equations
  • Advanced Recognition: Enhanced OCR capabilities for complex documents
  • API Key Management: Automatic API key rotation and load balancing
  • Subscription System: Flexible credit-based billing with multiple plans
  • User Authentication: Secure login with Google, GitHub, and email
  • Document History: Track and manage all processed documents
  • RESTful API: Developer-friendly API for integration

Tech Stack

Frontend

  • Next.js 15.5.4 - React framework with App Router
  • React 18.3.1 - UI library
  • TypeScript 5.5.3 - Type-safe development
  • Tailwind CSS - Utility-first CSS framework
  • Radix UI - Accessible component primitives
  • Framer Motion - Animation library

Backend

  • Next.js API Routes - Serverless API endpoints
  • NextAuth.js 5.0 - Authentication solution
  • Prisma ORM - Type-safe database access
  • PostgreSQL - Primary database

Infrastructure

  • Cloudflare R2 - File storage
  • Creem - Payment processing
  • Docker - Containerization

Quick Start

Prerequisites

  • Node.js 20 or higher
  • PostgreSQL database
  • pnpm package manager

Installation

  1. Clone the repository
git clone https://github.com/Selenium39/llmocr.git
cd llmocr
  1. Install dependencies
pnpm install
  1. Set up environment variables
cp .env.example .env

Edit .env file with your configuration (see Environment Variables)

  1. Initialize the database
npx prisma migrate deploy
  1. Run the development server
pnpm dev

Open http://localhost:3000 in your browser.

Environment Variables

Required Configuration

# Application URL
NEXT_PUBLIC_APP_URL=http://localhost:3000
NEXT_PUBLIC_APP_NAME=LLMOCR
FREE_TRIAL_CREDITS=30

# Database
DATABASE_URL='postgresql://user:password@localhost:5432/llmocr'

# Authentication (generate with: openssl rand -base64 32)
AUTH_SECRET=your_secret_key
AUTH_URL=http://localhost:3000
AUTH_TRUST_HOST=true

# OAuth Providers (optional)
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
GITHUB_ID=
GITHUB_SECRET=

# Cloudflare R2 Storage
STORAGE_REGION=auto
STORAGE_BUCKET_NAME=your_bucket_name
STORAGE_ACCESS_KEY_ID=your_access_key
STORAGE_SECRET_ACCESS_KEY=your_secret_key
STORAGE_ENDPOINT=https://your_endpoint.r2.cloudflarestorage.com
STORAGE_PUBLIC_URL=https://your_public_url.r2.dev

# Payment Integration (Creem)
CREEM_API_KEY=
CREEM_API_URL=https://api.creem.io
CREEM_WEBHOOK_SECRET=
CREEM_PRODUCT_BASIC=
CREEM_PRODUCT_PRO=
CREEM_PRODUCT_ULTRA=

API Key Management

LLMOCR uses a database-based API key management system. Configure your OCR provider API keys in the admin dashboard:

  1. Navigate to Admin Panel > API Keys
  2. Add API keys for your OCR providers (MISTRAL, DASHSCOPE)
  3. Keys are automatically rotated using round-robin algorithm
  4. Failed keys are automatically disabled

Docker Deployment

Using Docker Compose (Recommended)

  1. The project includes a docker-compose.yml file in the root directory. Update the environment variables as needed.

  2. Start the services:

docker-compose up -d

Manual Docker Build

# Build the image
docker build -t llmocr .

# Run the container
docker run -p 3000:3000 \
  -e DATABASE_URL="your_database_url" \
  -e AUTH_SECRET="your_secret" \
  llmocr

API Documentation

Authentication

All API endpoints require authentication via API key or session token.

Using API Key:

curl -X POST https://your-domain.com/api/pdf-to-markdown?key=YOUR_API_KEY \
  -H "Content-Type: application/json" \
  -d '{"file_url": "https://example.com/document.pdf"}'

Available Endpoints

PDF to Markdown

POST /api/pdf-to-markdown

Image to Markdown

POST /api/image-to-markdown

Text Recognition

POST /api/text-recognition

Multilingual Text Recognition

POST /api/multilingual-text-recognition

Key Information Extraction

POST /api/key-information-extraction

Formula Recognition

POST /api/formula-recognition

Advanced Recognition

POST /api/advanced-recognition

Request Example

const response = await fetch('https://your-domain.com/api/image-to-markdown?key=YOUR_API_KEY', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    images: [
      {
        type: 'image_url',
        image_url: 'data:image/jpeg;base64,...'
      }
    ]
  })
});

const result = await response.json();
console.log(result.content);

Development

Project Structure

llmocr/
├── app/                    # Next.js app directory
│   ├── api/               # API routes
│   ├── [locale]/          # Internationalized pages
│   └── ...
├── components/            # React components
├── lib/                   # Utility functions and services
│   ├── dto/              # Data transfer objects
│   ├── services/         # Business logic
│   └── ...
├── prisma/               # Database schema and migrations
├── public/               # Static assets
├── config/               # Configuration files
└── content/              # Content for static pages

Database Schema

The application uses Prisma ORM with PostgreSQL. Key models include:

  • User: User accounts and authentication
  • Subscription: User subscription plans and credits
  • ApiKey: OCR provider API key management
  • Document: Processed documents (PDF, Image, etc.)
  • BillingHistory: Transaction records
  • RedeemCode: Promotional codes

Running Migrations

# Generate migration
npx prisma migrate dev --name migration_name

# Apply migrations
npx prisma migrate deploy

# Open Prisma Studio
npx prisma studio

Building for Production

pnpm build
pnpm start

Features in Detail

API Key Rotation System

LLMOCR implements an intelligent API key rotation system:

  • Round-Robin Algorithm: Distributes requests evenly across available keys
  • Automatic Failover: Disables failed keys automatically
  • Usage Tracking: Monitors key usage statistics
  • Load Balancing: Prevents rate limiting by rotating keys

Subscription System

Flexible credit-based billing:

  • Free Trial: 30 pages for new users
  • Basic Plan: 1,000 pages/month
  • Pro Plan: 5,000 pages/month
  • Ultra Plan: 20,000 pages/month

Document Management

  • Secure cloud storage with Cloudflare R2
  • Document history and tracking
  • Download in multiple formats
  • Automatic cleanup of old files

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Support

Acknowledgments


Made with ❤️ by Selenium39

About

AI-Powered Intelligent OCR Service

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published