NeuraSnip is a semantic image search engine that understands what you're looking for, not just keywords. Search your personal photo collection using natural language queries like "sunset on beach", "person smiling", or "coffee shop receipt".
- Semantic Search - Search using natural language descriptions
- Image-to-Image Search - Upload an image to find similar ones
- Hybrid Search - Combine text + image for ultra-precise results
- OCR Integration - Search text within images
- Beautiful UI - Clean, modern Streamlit interface
- Fast Indexing - Batch processing with progress tracking
- Vector Database - Efficient FAISS-based storage
- Smart Filters - Color detection and filtering
- Random Explorer - Discover forgotten images
# Python 3.8 or higher
python --version
# Git (for cloning)
git --version# 1. Clone the repository
git clone https://github.com/yourusername/neurasnip.git
cd neurasnip
# 2. Create virtual environment
python -m venv venv
# 3. Activate virtual environment
# Windows (Git Bash):
source venv/Scripts/activate
# Windows (CMD):
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# 4. Install dependencies
pip install -r requirements.txt
# 5. Install Tesseract OCR (for text extraction)
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Linux: sudo apt-get install tesseract-ocr
# Mac: brew install tesseract# 1. Configure your image folder path
# Edit the path in src/indexer/image_indexer.py (line ~21):
images_folder: str = r"D:\YOUR_IMAGES_FOLDER"
# 2. Index your images
python -m src.indexer.image_indexer
# 3. Launch the web UI
streamlit run app.py
# 4. Open browser at http://localhost:8501Search using natural language descriptions:
# Example queries:
"sunset on beach"
"person wearing blue shirt"
"coffee shop receipt"
"cat sleeping on couch"
"document with text"
"group photo at party"Upload a reference image to find similar ones:
# Use cases:
- Find duplicates
- Find all photos from a location
- Find similar compositions
- Match color palettesCombine text description + reference image:
# Example:
Text: "person at landmark"
Image: [upload photo of Taj Mahal]
Result: All photos of people at Taj Mahal Discover forgotten images with one click:
# Perfect for:
- Rediscovering old photos
- Getting inspiration
- Random nostalgia tripsβββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend β
β Streamlit (Web Interface) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Search Engine β
β β’ Query Processing β
β β’ Result Ranking β
β β’ Hybrid Search Logic β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β CLIP Model β Vector Database β
β (ViT-B/32) β (FAISS IndexFlatIP) β
β β β
β β’ Text Encoding β β’ Fast Similarity Search β
β β’ Image Encodingβ β’ 512D Embeddings β
ββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Utilities Layer β
β β’ Image Processor (PIL) β
β β’ OCR Engine (Tesseract) β
β β’ Color Detector β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
neurasnip/
βββ app.py # Streamlit web interface
βββ requirements.txt # Python dependencies
βββ README.md # This file
β
βββ src/ # Core modules
β βββ __init__.py
β β
β βββ models/ # Neural network models
β β βββ __init__.py
β β βββ image_embeddings.py # CLIP model wrapper
β β
β βββ database/ # Vector storage
β β βββ __init__.py
β β βββ vector_db.py # FAISS database
β β
β βββ indexer/ # Image indexing
β β βββ __init__.py
β β βββ image_indexer.py # Batch indexer
β β
β βββ search/ # Search engine
β β βββ __init__.py
β β βββ search_engine.py # Query processor
β β
β βββ utils/ # Utilities
β βββ __init__.py
β βββ image_processor.py # Image handling
β βββ color_detector.py # Color analysis
β
βββ data/ # Data storage
β βββ images/ # Sample images (optional)
β βββ vector_store/ # Database files
β β βββ images.index # FAISS index
β β βββ images_metadata.pkl # Metadata
β βββ logs/ # Application logs
β
βββ tests/ # Unit tests (optional)
βββ test_search.py
Edit src/indexer/image_indexer.py:
def __init__(
self,
images_folder: str = r"D:\YOUR_IMAGES_FOLDER", # β Change this
db_path: str = "data/vector_store/images.index",
batch_size: int = 32,
skip_duplicates: bool = True,
):Change CLIP model in src/models/image_embeddings.py:
# Available models:
"ViT-B/32" # Fast, 512D (default)
"ViT-B/16" # Better quality, 512D
"ViT-L/14" # Best quality, 768D (slower)Configure OCR language in src/utils/image_processor.py:
# English (default)
text = pytesseract.image_to_string(img, lang='eng')
# Other languages:
# French: lang='fra'
# Spanish: lang='spa'
# German: lang='deu'| Image Count | Batch Size 32 | Single Processing |
|---|---|---|
| 100 images | ~45 seconds | ~2 minutes |
| 500 images | ~3 minutes | ~10 minutes |
| 1000 images | ~6 minutes | ~20 minutes |
- Text Query: < 100ms
- Image Query: < 200ms
- Hybrid Query: < 300ms
Tested on: Intel i7, 16GB RAM, No GPU
Track your image collection and search performance:
- Total images indexed
- Search history
- Database size
- Color distribution
- Performance metrics
- Refresh Database - Scan for new images
- Reindex All - Rebuild entire database
- Live Statistics - See progress in real-time
- Batch Processing - Fast indexing with progress bars
# Traditional keyword search:
Query: "cat"
Results: Only images with "cat" in filename β
# NeuraSnip semantic search:
Query: "cat"
Results: All images containing cats, even if
filename is "IMG_1234.jpg" β
# Works with complex descriptions:
"person wearing blue shirt at historic monument"
"sunset reflection on water with mountains"
"handwritten note on white paper"
"group of friends laughing outdoors"# Upload one beach photo
β Finds ALL beach photos in your collection
β Even with different angles, times, locations# Search text within images:
"receipt from Starbucks"
"invoice dated 2024"
"handwritten phone number"
"screenshot with code"# Index with custom settings
python -m src.indexer.image_indexer --folder "E:\Photos" --batch-size 64
# Force reindex (skip duplicate check)
python -m src.indexer.image_indexer --skip-duplicates False
# Index specific folder
python -m src.indexer.image_indexer --folder "D:\Work\Screenshots"from src import SearchEngine
# Initialize
engine = SearchEngine(db_path="data/vector_store/images.index")
# Text search
results = engine.search_by_text("sunset", top_k=10)
# Image search
results = engine.search_by_image("reference.jpg", top_k=10)
# Hybrid search
results = engine.search_hybrid(
query_text="beach",
query_image="reference.jpg",
text_weight=0.7,
image_weight=0.3
)
# Get statistics
stats = engine.get_statistics()
print(f"Total images: {stats['total_images']}")# Solution: Install CLIP
pip install git+https://github.com/openai/CLIP.git# Windows: Add to PATH
C:\Program Files\Tesseract-OCR
# Or specify in code:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'# Use CPU instead (in image_embeddings.py):
self.device = "cpu" # Force CPU# Reindex your images
python -m src.indexer.image_indexer# Click " Refresh" button in Streamlit sidebar
# Or reindex from command line
python -m src.indexer.image_indexerContributions are welcome! Please follow these steps:
# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/amazing-feature
# 3. Commit your changes
git commit -m "Add amazing feature"
# 4. Push to branch
git push origin feature/amazing-feature
# 5. Open a Pull Request# Install dev dependencies
pip install -r requirements-dev.txtThis project is licensed under the MIT License .
- OpenAI CLIP - For the amazing vision-language model
- FAISS - For efficient vector similarity search
- Streamlit - For the beautiful web framework
- Tesseract - For OCR capabilities
- PIL/Pillow - For image processing
Ayush Kumar - (https://www.linkedin.com/in/mr-ayush-kumar-004/)
Project Link: https://github.com/Ayushkumar111/neurasnip
If you find this project useful, please consider giving it a star! β
Made with β€οΈ and Python






