A browser extension that extracts sections from PDF files by keyword search — works on both local downloaded PDFs and online PDFs opened in Edge/Chrome.
- Keyword search across all sections of a PDF
- Visual section browser with checkboxes for multi-select
- Edit extracted text before copying
- Works on
file:///local PDFs and online PDFs - No external dependencies or AI APIs — fully local
- Clone or download this repo
- Open
edge://extensions/(orchrome://extensions/) - Enable Developer mode
- Click Load unpacked → select this folder
- On the extension details page, enable Allow access to file URLs
- Open any PDF in Edge or Chrome
- A purple 📋 N sections button appears at the bottom-right
- Click it to browse all sections, or use the extension popup to search by keyword
- Select sections → Copy or Edit before copying
- Microsoft Edge (Chromium)
- Google Chrome
-
Clone the repository
git clone https://github.com/yourusername/ai-pdf-extractor.git cd ai-pdf-extractor -
Load in Chrome/Edge/Brave
- Open
chrome://extensions/(oredge://extensions/) - Enable "Developer mode" (toggle in top-right)
- Click "Load unpacked"
- Select the extension folder
- Open
-
Create Icons (Optional)
- Create 16x16, 48x48, and 128x128 px icons
- Name them
icon16.png,icon48.png,icon128.png - Place in
icons/folder - Or comment out icon references in
manifest.json
- Open any PDF in your browser
- Click the extension icon
- Type your query: "introduction", "methodology", "results"
- Press Enter or click "Find & Copy"
- Content copied! ✓
✓ "introduction" → Finds Introduction section
✓ "methodology" → Finds Methods/Methodology
✓ "section about results" → Finds Results/Findings
✓ "chapter 3" → Finds Chapter 3
✓ "experimental design" → Finds related sections
- Model: Universal Sentence Encoder (Google)
- Framework: TensorFlow.js
- Size: ~50MB (cached after first load)
- Processing: 100% local (no data sent to servers)
PDF → Parse Structure → Detect Sections → AI Embeddings
↓
User Query → AI Embedding → Semantic Match → Copy!
Keyword Matching ❌
- Query: "machine learning" → Only finds exact phrase
- Misses: "artificial intelligence", "neural networks"
AI Semantic Matching ✅
- Query: "machine learning" → Finds:
- "Artificial Intelligence"
- "Deep Learning Models"
- "Neural Network Training"
- Any semantically related content!
ai-pdf-extractor/
├── manifest.json # Extension configuration
├── background.js # Service worker
├── content.js # PDF parsing & AI integration
├── ai-model.js # TensorFlow.js AI model
├── popup.html # Extension UI
├── popup.css # Styling
├── popup.js # UI logic
├── icons/ # Extension icons
└── README.md # This file
Edit ai-model.js to customize:
// Similarity threshold (0.0 to 1.0)
const SIMILARITY_THRESHOLD = 0.3; // Default: 30%
// Number of results to return
const TOP_K_RESULTS = 3; // Default: 3
// Model URL (change to use different model)
const MODEL_URL = 'https://cdn.jsdelivr.net/npm/@tensorflow-models/universal-sentence-encoder@1.3.3';- Click "Voice Query" button
- Say: "copy introduction" or "copy methodology"
- AI finds and copies the content
- Extension shows all detected sections
- Click any heading to copy that section instantly
- No typing required!
Instead of exact matches, use natural language:
- "section about experimental methods"
- "part discussing the results"
- "anything related to machine learning"
| Metric | Value |
|---|---|
| First load | 3-10 seconds (model download) |
| Subsequent loads | Instant (cached) |
| Query processing | 100-300ms |
| Accuracy | 85%+ for semantic matches |
| Memory usage | ~90MB with model loaded |
- ✅ 100% Local Processing - AI runs in your browser
- ✅ No Data Transmission - Nothing sent to external servers
- ✅ Offline Capable - Works without internet (after first load)
- ✅ No Tracking - Zero telemetry or analytics
- ✅ Open Source - Fully auditable code
| Browser | Status |
|---|---|
| Chrome 88+ | ✅ Full support |
| Edge 88+ | ✅ Full support |
| Brave | ✅ Full support |
| Opera | ✅ Full support |
| Vivaldi | ✅ Full support |
| Firefox |
- PDF might be image-based (scanned)
- Try PDFs with text layers
- Refresh page and reopen extension
- Try simpler keywords: "intro" instead of "introduction section"
- Browse sections list to see what's available
- Lower threshold in settings
- Check internet connection (first load only)
- Clear browser cache and reload
- Check browser console for errors
- Grant microphone permissions
- Check browser microphone settings
- Use text input as alternative
- Basic knowledge of JavaScript
- Understanding of browser extensions
- Chrome/Edge browser
# Clone repository
git clone https://github.com/yourusername/ai-pdf-extractor.git
cd ai-pdf-extractor
# Make changes to code files
# Test in browser
# 1. Go to chrome://extensions/
# 2. Enable Developer mode
# 3. Click "Load unpacked"
# 4. Select project folder
# 5. Test changes
# 6. Click reload icon on extension card after changes- manifest.json: Extension metadata, permissions, and configuration
- background.js: Service worker for clipboard operations
- content.js: Injected into PDF pages, handles parsing and AI
- ai-model.js: TensorFlow.js integration and semantic search
- popup.html/css/js: Extension popup interface
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Research - Universal Sentence Encoder model
- TensorFlow.js Team - Browser ML framework
- Open Source Community - Inspiration and support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Firefox compatibility (Manifest V2)
- OCR support for scanned PDFs
- Multi-language support
- Custom model training interface
- Export to various formats (JSON, Markdown)
- Cloud sync for saved queries
- Summarization feature
- Question answering over PDF content
Made with ❤️ and 🤖 AI for smarter PDF management
Star ⭐ this repo if you find it useful!