LLMOCR

AI-powered OCR service for converting PDFs and images to Markdown with 99.9% accuracy.

Overview

LLMOCR is a comprehensive AI-powered OCR (Optical Character Recognition) service that provides high-accuracy text extraction and document conversion capabilities. Built with Next.js and powered by advanced AI models, it offers multiple OCR features including PDF to Markdown conversion, multilingual text recognition, formula recognition, and key information extraction.

Features

PDF to Markdown: Convert PDF documents to Markdown format with high accuracy
Image to Markdown: Extract text from images and convert to Markdown
Multilingual Text Recognition: Support for multiple languages with advanced recognition
Text Recognition: General-purpose text extraction from images
Key Information Extraction: Extract structured data from documents
Formula Recognition: Recognize mathematical formulas and equations
Advanced Recognition: Enhanced OCR capabilities for complex documents
API Key Management: Automatic API key rotation and load balancing
Subscription System: Flexible credit-based billing with multiple plans
User Authentication: Secure login with Google, GitHub, and email
Document History: Track and manage all processed documents
RESTful API: Developer-friendly API for integration

Tech Stack

Frontend

Next.js 15.5.4 - React framework with App Router
React 18.3.1 - UI library
TypeScript 5.5.3 - Type-safe development
Tailwind CSS - Utility-first CSS framework
Radix UI - Accessible component primitives
Framer Motion - Animation library

Backend

Next.js API Routes - Serverless API endpoints
NextAuth.js 5.0 - Authentication solution
Prisma ORM - Type-safe database access
PostgreSQL - Primary database

Infrastructure

Cloudflare R2 - File storage
Creem - Payment processing
Docker - Containerization

Quick Start

Prerequisites

Node.js 20 or higher
PostgreSQL database
pnpm package manager

Installation

Clone the repository

git clone https://github.com/Selenium39/llmocr.git
cd llmocr

Install dependencies

pnpm install

Set up environment variables

cp .env.example .env

Edit .env file with your configuration (see Environment Variables)

Initialize the database

npx prisma migrate deploy

Run the development server

pnpm dev

Open http://localhost:3000 in your browser.

Environment Variables

Required Configuration

# Application URL
NEXT_PUBLIC_APP_URL=http://localhost:3000
NEXT_PUBLIC_APP_NAME=LLMOCR
FREE_TRIAL_CREDITS=30

# Database
DATABASE_URL='postgresql://user:password@localhost:5432/llmocr'

# Authentication (generate with: openssl rand -base64 32)
AUTH_SECRET=your_secret_key
AUTH_URL=http://localhost:3000
AUTH_TRUST_HOST=true

# OAuth Providers (optional)
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
GITHUB_ID=
GITHUB_SECRET=

# Cloudflare R2 Storage
STORAGE_REGION=auto
STORAGE_BUCKET_NAME=your_bucket_name
STORAGE_ACCESS_KEY_ID=your_access_key
STORAGE_SECRET_ACCESS_KEY=your_secret_key
STORAGE_ENDPOINT=https://your_endpoint.r2.cloudflarestorage.com
STORAGE_PUBLIC_URL=https://your_public_url.r2.dev

# Payment Integration (Creem)
CREEM_API_KEY=
CREEM_API_URL=https://api.creem.io
CREEM_WEBHOOK_SECRET=
CREEM_PRODUCT_BASIC=
CREEM_PRODUCT_PRO=
CREEM_PRODUCT_ULTRA=

API Key Management

LLMOCR uses a database-based API key management system. Configure your OCR provider API keys in the admin dashboard:

Navigate to Admin Panel > API Keys
Add API keys for your OCR providers (MISTRAL, DASHSCOPE)
Keys are automatically rotated using round-robin algorithm
Failed keys are automatically disabled

Docker Deployment

Using Docker Compose (Recommended)

The project includes a docker-compose.yml file in the root directory. Update the environment variables as needed.
Start the services:

docker-compose up -d

Manual Docker Build

# Build the image
docker build -t llmocr .

# Run the container
docker run -p 3000:3000 \
  -e DATABASE_URL="your_database_url" \
  -e AUTH_SECRET="your_secret" \
  llmocr

API Documentation

Authentication

All API endpoints require authentication via API key or session token.

Using API Key:

curl -X POST https://your-domain.com/api/pdf-to-markdown?key=YOUR_API_KEY \
  -H "Content-Type: application/json" \
  -d '{"file_url": "https://example.com/document.pdf"}'

Available Endpoints

PDF to Markdown

POST /api/pdf-to-markdown

Image to Markdown

POST /api/image-to-markdown

Text Recognition

POST /api/text-recognition

Multilingual Text Recognition

POST /api/multilingual-text-recognition

Key Information Extraction

POST /api/key-information-extraction

Formula Recognition

POST /api/formula-recognition

Advanced Recognition

POST /api/advanced-recognition

Request Example

const response = await fetch('https://your-domain.com/api/image-to-markdown?key=YOUR_API_KEY', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    images: [
      {
        type: 'image_url',
        image_url: 'data:image/jpeg;base64,...'
      }
    ]
  })
});

const result = await response.json();
console.log(result.content);

Development

Project Structure

llmocr/
├── app/                    # Next.js app directory
│   ├── api/               # API routes
│   ├── [locale]/          # Internationalized pages
│   └── ...
├── components/            # React components
├── lib/                   # Utility functions and services
│   ├── dto/              # Data transfer objects
│   ├── services/         # Business logic
│   └── ...
├── prisma/               # Database schema and migrations
├── public/               # Static assets
├── config/               # Configuration files
└── content/              # Content for static pages

Database Schema

The application uses Prisma ORM with PostgreSQL. Key models include:

User: User accounts and authentication
Subscription: User subscription plans and credits
ApiKey: OCR provider API key management
Document: Processed documents (PDF, Image, etc.)
BillingHistory: Transaction records
RedeemCode: Promotional codes

Running Migrations

# Generate migration
npx prisma migrate dev --name migration_name

# Apply migrations
npx prisma migrate deploy

# Open Prisma Studio
npx prisma studio

Building for Production

pnpm build
pnpm start

Features in Detail

API Key Rotation System

LLMOCR implements an intelligent API key rotation system:

Round-Robin Algorithm: Distributes requests evenly across available keys
Automatic Failover: Disables failed keys automatically
Usage Tracking: Monitors key usage statistics
Load Balancing: Prevents rate limiting by rotating keys

Subscription System

Flexible credit-based billing:

Free Trial: 30 pages for new users
Basic Plan: 1,000 pages/month
Pro Plan: 5,000 pages/month
Ultra Plan: 20,000 pages/month

Document Management

Secure cloud storage with Cloudflare R2
Document history and tracking
Download in multiple formats
Automatic cleanup of old files

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Support

Email: selenium39@qq.com
GitHub Issues: Create an issue

Acknowledgments

Built with Next.js
Powered by AI OCR services
UI components from Radix UI
Styled with Tailwind CSS

Made with ❤️ by Selenium39

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.husky		.husky
actions		actions
app		app
assets/fonts		assets/fonts
components		components
config		config
content		content
hooks		hooks
i18n		i18n
lib		lib
locales		locales
prisma		prisma
public		public
styles		styles
types		types
.commitlintrc.json		.commitlintrc.json
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.prettierignore		.prettierignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
README.zh-CN.md		README.zh-CN.md
auth.config.ts		auth.config.ts
auth.ts		auth.ts
components.json		components.json
contentlayer.config.ts		contentlayer.config.ts
docker-compose.yml		docker-compose.yml
env.mjs		env.mjs
gtag.js		gtag.js
middleware.ts		middleware.ts
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
prettier.config.js		prettier.config.js
setup.mjs		setup.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

License

Selenium39/LLMOCR

Folders and files

Latest commit

History

Repository files navigation

LLMOCR

Overview

Features

Tech Stack

Frontend

Backend

Infrastructure

Quick Start

Prerequisites

Installation

Environment Variables

Required Configuration

API Key Management

Docker Deployment

Using Docker Compose (Recommended)

Manual Docker Build

API Documentation

Authentication

Available Endpoints

PDF to Markdown

Image to Markdown

Text Recognition

Multilingual Text Recognition

Key Information Extraction

Formula Recognition

Advanced Recognition

Request Example

Development

Project Structure

Database Schema

Running Migrations

Building for Production

Features in Detail

API Key Rotation System

Subscription System

Document Management

Contributing

License

Support

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages