An ultra-fast PDF preview solution designed for AI / RAG scenarios
Open 1000+ page PDFs instantly!
English | 简体中文
In AI / RAG applications, PDFs are one of the most common data sources, such as:
- 📚 Academic papers
- 📑 Technical documentation
- 📊 Research reports
- 📖 E-books
However, traditional PDF renderers perform very poorly with large files:
| Page Count | react-pdf Load Time |
|---|---|
| 50 pages | 7s |
| 344 pages | 109s |
| 1000 pages | 240s |
In AI applications, users often need to quickly open the original document to check the citation source.
Waiting tens of seconds or even minutes creates a terrible user experience.
LM PDF has only one goal:
Make PDF loading speed independent of page count
Whether it is:
- 50 pages
- 500 pages
- 1000 pages
It can still achieve near-instant loading.
Kapture.2025-11-18.at.23.45.35.mp4
Example of opening a 290-page PDF
- ⚡ Instant PDF loading
- 🧠 Loading time nearly independent of page count
- 📉 Extremely low memory usage
- 🎯 Very few DOM nodes
- 🤖 Perfect for AI / RAG scenarios
Traditional PDF rendering workflow:
PDF
↓
pdf.js parsing
↓
Parse fonts / layout / text
↓
Canvas rendering
The problems:
- Parsing PDF grammar is extremely complex
- Font parsing is expensive
- The larger the document, the slower it becomes
Treat the PDF as an image sequence
PDF
↓
Server-side page splitting
↓
page1.png
page2.png
page3.png
The frontend only needs to:
Load images on demand
No PDF parsing is required.
PDF File
│
│
┌──────▼──────┐
│ Backend │
│ │
│ PDF → Image │
│ PyMuPDF │
│ pdf2image │
└──────┬──────┘
│
│ HTTP
│
┌──────▼──────┐
│ Frontend │
│ │
│ React │
│ react-konva │
│ Canvas │
│ VirtualScroll│
└─────────────┘
LM PDF's performance comes from three key design decisions.
Traditional PDF viewers use:
DOM + Canvas + Text Layer
The number of DOM nodes grows with the number of pages.
LM PDF uses:
Canvas
└ Image
Only images are rendered.
The number of DOM nodes remains almost constant.
Even if the PDF has 1000 pages:
Users can only see at most:
3 ~ 5 pages
So only the visible pages are rendered.
Example:
Total pages: 1000
Actually rendered: 5
This is virtual scrolling.
During scrolling:
Instead of re-rendering the entire page:
Old elements
↓
diff
↓
position updates
This ensures smooth and fluid scrolling.
Test environment:
- Network speed: 13.9 Mbps
| PDF Pages | react-pdf | LM PDF |
|---|---|---|
| 3 pages | 3.5s | 1s |
| 50 pages | 7s | 1.5s |
| 344 pages | 109s | 2.5s |
| 1000 pages | 240s | 2.5s |
Conclusion:
LM PDF's loading speed is almost unaffected by the number of pages.
LM PDF is especially suitable for:
Used to display:
- Original citations
- Document context
- Knowledge sources
Quickly browse:
- Research papers
- Books
- Technical documentation
For example:
- Annual reports
- Research reports
- Data reports
lm-pdf
├── backend
│ ├── FastAPI
│ ├── PyMuPDF
│ └── pdf2image
│
├── packages
│ ├── easy-pdf
│ │ React component library
│ │
│ └── web
│ Demo application
│
├── pnpm-workspace.yaml
└── package.json
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.pypnpm install
pnpm devVisit:
http://localhost:5173
- FastAPI
- PyMuPDF
- pdf2image
- React 18
- TypeScript
- Vite
- react-konva
- Recoil
- ⚡ Extremely fast initial loading
- 📉 Very low memory usage
- 🎯 Smooth scrolling experience
- 📦 Very few DOM nodes
- 🤖 Ideal for AI applications
- Copying PDF text is not supported yet
- PDFs must be pre-split on the backend
Planned features:
- PDF text layer
- Search
- Annotations
- Highlighting
- WebAssembly rendering
Issues and Pull Requests are welcome.
If this project helps you, please consider giving it a ⭐.
MIT License