🔍 NeuraSnip - Semantic Image Search Engine

Search your images using natural language, powered by OpenAI's CLIP model

🎯 What is NeuraSnip?

NeuraSnip is a semantic image search engine that understands what you're looking for, not just keywords. Search your personal photo collection using natural language queries like "sunset on beach", "person smiling", or "coffee shop receipt".

🌟 Key Features

Semantic Search - Search using natural language descriptions
Image-to-Image Search - Upload an image to find similar ones
Hybrid Search - Combine text + image for ultra-precise results
OCR Integration - Search text within images
Beautiful UI - Clean, modern Streamlit interface
Fast Indexing - Batch processing with progress tracking
Vector Database - Efficient FAISS-based storage
Smart Filters - Color detection and filtering
Random Explorer - Discover forgotten images

🚀 Quick Start

Prerequisites

# Python 3.8 or higher
python --version

# Git (for cloning)
git --version

Installation

# 1. Clone the repository
git clone https://github.com/yourusername/neurasnip.git
cd neurasnip

# 2. Create virtual environment
python -m venv venv

# 3. Activate virtual environment
# Windows (Git Bash):
source venv/Scripts/activate
# Windows (CMD):
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# 4. Install dependencies
pip install -r requirements.txt

# 5. Install Tesseract OCR (for text extraction)
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Linux: sudo apt-get install tesseract-ocr
# Mac: brew install tesseract

First Run

# 1. Configure your image folder path
# Edit the path in src/indexer/image_indexer.py (line ~21):
images_folder: str = r"D:\YOUR_IMAGES_FOLDER"

# 2. Index your images
python -m src.indexer.image_indexer

# 3. Launch the web UI
streamlit run app.py

# 4. Open browser at http://localhost:8501

📖 Usage Guide

1️⃣ Text Search

Search using natural language descriptions:

Natural language text search with relevance scores

# Example queries:
"sunset on beach"
"person wearing blue shirt"
"coffee shop receipt"
"cat sleeping on couch"
"document with text"
"group photo at party"

2️⃣ Image Search

Upload a reference image to find similar ones:

Upload any image to find visually similar matches

# Use cases:
- Find duplicates
- Find all photos from a location
- Find similar compositions
- Match color palettes

3️⃣ Hybrid Search

Combine text description + reference image:

Adjust text/image weights for precise control over search results

# Example:
Text: "person at landmark"
Image: [upload photo of Taj Mahal]
Result: All photos of people at Taj Mahal

4️⃣ Random Explorer

Discover forgotten images with one click:

Rediscover your photo collection with random sampling

# Perfect for:
- Rediscovering old photos
- Getting inspiration
- Random nostalgia trips

🏗️ Architecture

Technology Stack

┌─────────────────────────────────────────────────┐
│                   Frontend                      │
│         Streamlit (Web Interface)               │
└─────────────────────────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────┐
│                Search Engine                    │
│  • Query Processing                             │
│  • Result Ranking                               │
│  • Hybrid Search Logic                          │
└─────────────────────────────────────────────────┘
                      ▼
┌──────────────────┬──────────────────────────────┐
│   CLIP Model     │    Vector Database           │
│   (ViT-B/32)     │    (FAISS IndexFlatIP)       │
│                  │                              │
│  • Text Encoding │  • Fast Similarity Search    │
│  • Image Encoding│  • 512D Embeddings           │
└──────────────────┴──────────────────────────────┘
                      ▼
┌─────────────────────────────────────────────────┐
│              Utilities Layer                    │
│  • Image Processor (PIL)                        │
│  • OCR Engine (Tesseract)                       │
│  • Color Detector                               │
└─────────────────────────────────────────────────┘

Project Structure

neurasnip/
├── app.py                          # Streamlit web interface
├── requirements.txt                # Python dependencies
├── README.md                       # This file
│
├── src/                           # Core modules
│   ├── __init__.py
│   │
│   ├── models/                    # Neural network models
│   │   ├── __init__.py
│   │   └── image_embeddings.py   # CLIP model wrapper
│   │
│   ├── database/                  # Vector storage
│   │   ├── __init__.py
│   │   └── vector_db.py          # FAISS database
│   │
│   ├── indexer/                   # Image indexing
│   │   ├── __init__.py
│   │   └── image_indexer.py      # Batch indexer
│   │
│   ├── search/                    # Search engine
│   │   ├── __init__.py
│   │   └── search_engine.py      # Query processor
│   │
│   └── utils/                     # Utilities
│       ├── __init__.py
│       ├── image_processor.py    # Image handling
│       └── color_detector.py     # Color analysis
│
├── data/                          # Data storage
│   ├── images/                    # Sample images (optional)
│   ├── vector_store/              # Database files
│   │   ├── images.index          # FAISS index
│   │   └── images_metadata.pkl   # Metadata
│   └── logs/                      # Application logs
│
└── tests/                         # Unit tests (optional)
    └── test_search.py

🔧 Configuration

Image Folder Path

Edit src/indexer/image_indexer.py:

def __init__(
    self,
    images_folder: str = r"D:\YOUR_IMAGES_FOLDER",  # ← Change this
    db_path: str = "data/vector_store/images.index",
    batch_size: int = 32,
    skip_duplicates: bool = True,
):

Model Selection

Change CLIP model in src/models/image_embeddings.py:

# Available models:
"ViT-B/32"   # Fast, 512D (default)
"ViT-B/16"   # Better quality, 512D
"ViT-L/14"   # Best quality, 768D (slower)

OCR Language

Configure OCR language in src/utils/image_processor.py:

# English (default)
text = pytesseract.image_to_string(img, lang='eng')

# Other languages:
# French: lang='fra'
# Spanish: lang='spa'
# German: lang='deu'

Performance

Indexing Speed

Image Count	Batch Size 32	Single Processing
100 images	~45 seconds	~2 minutes
500 images	~3 minutes	~10 minutes
1000 images	~6 minutes	~20 minutes

Search Speed

Text Query: < 100ms
Image Query: < 200ms
Hybrid Query: < 300ms

Tested on: Intel i7, 16GB RAM, No GPU

Dashboard & Statistics

Track your image collection and search performance:

Comprehensive analytics about your image collection

Features:

Total images indexed
Search history
Database size
Color distribution
Performance metrics

🔄 Database Management

Real-time Progress Tracking

Real-time progress tracking during image indexing

Features:

Refresh Database - Scan for new images
Reindex All - Rebuild entire database
Live Statistics - See progress in real-time
Batch Processing - Fast indexing with progress bars

Features Deep Dive

1. Semantic Understanding

# Traditional keyword search:
Query: "cat"
Results: Only images with "cat" in filename ❌

# NeuraSnip semantic search:
Query: "cat"
Results: All images containing cats, even if 
         filename is "IMG_1234.jpg" ✅

2. Natural Language Queries

# Works with complex descriptions:
"person wearing blue shirt at historic monument"
"sunset reflection on water with mountains"
"handwritten note on white paper"
"group of friends laughing outdoors"

3. Visual Similarity

# Upload one beach photo
→ Finds ALL beach photos in your collection
→ Even with different angles, times, locations

4. OCR Text Search

# Search text within images:
"receipt from Starbucks"
"invoice dated 2024"
"handwritten phone number"
"screenshot with code"

Advanced Usage

Command Line Indexing

# Index with custom settings
python -m src.indexer.image_indexer --folder "E:\Photos" --batch-size 64

# Force reindex (skip duplicate check)
python -m src.indexer.image_indexer --skip-duplicates False

# Index specific folder
python -m src.indexer.image_indexer --folder "D:\Work\Screenshots"

Programmatic Usage

from src import SearchEngine

# Initialize
engine = SearchEngine(db_path="data/vector_store/images.index")

# Text search
results = engine.search_by_text("sunset", top_k=10)

# Image search
results = engine.search_by_image("reference.jpg", top_k=10)

# Hybrid search
results = engine.search_hybrid(
    query_text="beach",
    query_image="reference.jpg",
    text_weight=0.7,
    image_weight=0.3
)

# Get statistics
stats = engine.get_statistics()
print(f"Total images: {stats['total_images']}")

Troubleshooting

Issue: "No module named 'clip'"

# Solution: Install CLIP
pip install git+https://github.com/openai/CLIP.git

Issue: "Tesseract not found"

# Windows: Add to PATH
C:\Program Files\Tesseract-OCR

# Or specify in code:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Issue: "CUDA out of memory"

# Use CPU instead (in image_embeddings.py):
self.device = "cpu"  # Force CPU

Issue: "Database not found"

# Reindex your images
python -m src.indexer.image_indexer

Issue: "Images not appearing in search"

# Click " Refresh" button in Streamlit sidebar
# Or reindex from command line
python -m src.indexer.image_indexer

🤝 Contributing

Contributions are welcome! Please follow these steps:

# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/amazing-feature

# 3. Commit your changes
git commit -m "Add amazing feature"

# 4. Push to branch
git push origin feature/amazing-feature

# 5. Open a Pull Request

Development Setup

# Install dev dependencies
pip install -r requirements-dev.txt

License

This project is licensed under the MIT License .

Acknowledgments

OpenAI CLIP - For the amazing vision-language model
FAISS - For efficient vector similarity search
Streamlit - For the beautiful web framework
Tesseract - For OCR capabilities
PIL/Pillow - For image processing

Contact

Ayush Kumar - (https://www.linkedin.com/in/mr-ayush-kumar-004/)

Project Link: https://github.com/Ayushkumar111/neurasnip

If you find this project useful, please consider giving it a star! ⭐

Made with ❤️ and Python

Report Bug • Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
docs/images		docs/images
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py
test_embeddings.py		test_embeddings.py
test_image_processing.py		test_image_processing.py
test_indexer.py		test_indexer.py
test_search.py		test_search.py
test_vector_db.py		test_vector_db.py

Ayushkumar111/neurasnip

Folders and files

Latest commit

History

Repository files navigation

🔍 NeuraSnip - Semantic Image Search Engine

🎯 What is NeuraSnip?

🌟 Key Features

🚀 Quick Start

Prerequisites

Installation

First Run

📖 Usage Guide

1️⃣ Text Search

2️⃣ Image Search

3️⃣ Hybrid Search

4️⃣ Random Explorer

🏗️ Architecture

Technology Stack

Project Structure

🔧 Configuration

Image Folder Path

Model Selection

OCR Language

Performance

Indexing Speed

Search Speed

Dashboard & Statistics

Features:

🔄 Database Management

Real-time Progress Tracking

Features:

Features Deep Dive

1. Semantic Understanding

2. Natural Language Queries

3. Visual Similarity

4. OCR Text Search

Advanced Usage

Command Line Indexing

Programmatic Usage

Troubleshooting

Issue: "No module named 'clip'"

Issue: "Tesseract not found"

Issue: "CUDA out of memory"

Issue: "Database not found"

Issue: "Images not appearing in search"

🤝 Contributing

Development Setup

License

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages