Skip to content

Ayushkumar111/neurasnip

Repository files navigation

πŸ” NeuraSnip - Semantic Image Search Engine

NeuraSnip Banner

Python CLIP Streamlit License

Search your images using natural language, powered by OpenAI's CLIP model


🎯 What is NeuraSnip?

NeuraSnip is a semantic image search engine that understands what you're looking for, not just keywords. Search your personal photo collection using natural language queries like "sunset on beach", "person smiling", or "coffee shop receipt".

🌟 Key Features

  • Semantic Search - Search using natural language descriptions
  • Image-to-Image Search - Upload an image to find similar ones
  • Hybrid Search - Combine text + image for ultra-precise results
  • OCR Integration - Search text within images
  • Beautiful UI - Clean, modern Streamlit interface
  • Fast Indexing - Batch processing with progress tracking
  • Vector Database - Efficient FAISS-based storage
  • Smart Filters - Color detection and filtering
  • Random Explorer - Discover forgotten images

πŸš€ Quick Start

Prerequisites

# Python 3.8 or higher
python --version

# Git (for cloning)
git --version

Installation

# 1. Clone the repository
git clone https://github.com/yourusername/neurasnip.git
cd neurasnip

# 2. Create virtual environment
python -m venv venv

# 3. Activate virtual environment
# Windows (Git Bash):
source venv/Scripts/activate
# Windows (CMD):
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# 4. Install dependencies
pip install -r requirements.txt

# 5. Install Tesseract OCR (for text extraction)
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Linux: sudo apt-get install tesseract-ocr
# Mac: brew install tesseract

First Run

# 1. Configure your image folder path
# Edit the path in src/indexer/image_indexer.py (line ~21):
images_folder: str = r"D:\YOUR_IMAGES_FOLDER"

# 2. Index your images
python -m src.indexer.image_indexer

# 3. Launch the web UI
streamlit run app.py

# 4. Open browser at http://localhost:8501

πŸ“– Usage Guide

1️⃣ Text Search

Search using natural language descriptions:

Text Search Interface

Natural language text search with relevance scores

# Example queries:
"sunset on beach"
"person wearing blue shirt"
"coffee shop receipt"
"cat sleeping on couch"
"document with text"
"group photo at party"

2️⃣ Image Search

Upload a reference image to find similar ones:

Image Search Interface

Upload any image to find visually similar matches

# Use cases:
- Find duplicates
- Find all photos from a location
- Find similar compositions
- Match color palettes

3️⃣ Hybrid Search

Combine text description + reference image:

Hybrid Search Interface

Adjust text/image weights for precise control over search results

# Example:
Text: "person at landmark"
Image: [upload photo of Taj Mahal]
Result: All photos of people at Taj Mahal 

4️⃣ Random Explorer

Discover forgotten images with one click:

Random Explorer

Rediscover your photo collection with random sampling

# Perfect for:
- Rediscovering old photos
- Getting inspiration
- Random nostalgia trips

πŸ—οΈ Architecture

Technology Stack

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Frontend                      β”‚
β”‚         Streamlit (Web Interface)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Search Engine                    β”‚
β”‚  β€’ Query Processing                             β”‚
β”‚  β€’ Result Ranking                               β”‚
β”‚  β€’ Hybrid Search Logic                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CLIP Model     β”‚    Vector Database           β”‚
β”‚   (ViT-B/32)     β”‚    (FAISS IndexFlatIP)       β”‚
β”‚                  β”‚                              β”‚
β”‚  β€’ Text Encoding β”‚  β€’ Fast Similarity Search    β”‚
β”‚  β€’ Image Encodingβ”‚  β€’ 512D Embeddings           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Utilities Layer                    β”‚
β”‚  β€’ Image Processor (PIL)                        β”‚
β”‚  β€’ OCR Engine (Tesseract)                       β”‚
β”‚  β€’ Color Detector                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Project Structure

neurasnip/
β”œβ”€β”€ app.py                          # Streamlit web interface
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ README.md                       # This file
β”‚
β”œβ”€β”€ src/                           # Core modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚
β”‚   β”œβ”€β”€ models/                    # Neural network models
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── image_embeddings.py   # CLIP model wrapper
β”‚   β”‚
β”‚   β”œβ”€β”€ database/                  # Vector storage
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── vector_db.py          # FAISS database
β”‚   β”‚
β”‚   β”œβ”€β”€ indexer/                   # Image indexing
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── image_indexer.py      # Batch indexer
β”‚   β”‚
β”‚   β”œβ”€β”€ search/                    # Search engine
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── search_engine.py      # Query processor
β”‚   β”‚
β”‚   └── utils/                     # Utilities
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ image_processor.py    # Image handling
β”‚       └── color_detector.py     # Color analysis
β”‚
β”œβ”€β”€ data/                          # Data storage
β”‚   β”œβ”€β”€ images/                    # Sample images (optional)
β”‚   β”œβ”€β”€ vector_store/              # Database files
β”‚   β”‚   β”œβ”€β”€ images.index          # FAISS index
β”‚   β”‚   └── images_metadata.pkl   # Metadata
β”‚   └── logs/                      # Application logs
β”‚
└── tests/                         # Unit tests (optional)
    └── test_search.py

πŸ”§ Configuration

Image Folder Path

Edit src/indexer/image_indexer.py:

def __init__(
    self,
    images_folder: str = r"D:\YOUR_IMAGES_FOLDER",  # ← Change this
    db_path: str = "data/vector_store/images.index",
    batch_size: int = 32,
    skip_duplicates: bool = True,
):

Model Selection

Change CLIP model in src/models/image_embeddings.py:

# Available models:
"ViT-B/32"   # Fast, 512D (default)
"ViT-B/16"   # Better quality, 512D
"ViT-L/14"   # Best quality, 768D (slower)

OCR Language

Configure OCR language in src/utils/image_processor.py:

# English (default)
text = pytesseract.image_to_string(img, lang='eng')

# Other languages:
# French: lang='fra'
# Spanish: lang='spa'
# German: lang='deu'

Performance

Indexing Speed

Image Count Batch Size 32 Single Processing
100 images ~45 seconds ~2 minutes
500 images ~3 minutes ~10 minutes
1000 images ~6 minutes ~20 minutes

Search Speed

  • Text Query: < 100ms
  • Image Query: < 200ms
  • Hybrid Query: < 300ms

Tested on: Intel i7, 16GB RAM, No GPU

Dashboard & Statistics

Track your image collection and search performance:

Dashboard Statistics

Comprehensive analytics about your image collection

Features:

  • Total images indexed
  • Search history
  • Database size
  • Color distribution
  • Performance metrics

πŸ”„ Database Management

Real-time Progress Tracking

Indexing Progress

Real-time progress tracking during image indexing

Features:

  • Refresh Database - Scan for new images
  • Reindex All - Rebuild entire database
  • Live Statistics - See progress in real-time
  • Batch Processing - Fast indexing with progress bars

Features Deep Dive

1. Semantic Understanding

# Traditional keyword search:
Query: "cat"
Results: Only images with "cat" in filename ❌

# NeuraSnip semantic search:
Query: "cat"
Results: All images containing cats, even if 
         filename is "IMG_1234.jpg" βœ…

2. Natural Language Queries

# Works with complex descriptions:
"person wearing blue shirt at historic monument"
"sunset reflection on water with mountains"
"handwritten note on white paper"
"group of friends laughing outdoors"

3. Visual Similarity

# Upload one beach photo
β†’ Finds ALL beach photos in your collection
β†’ Even with different angles, times, locations

4. OCR Text Search

# Search text within images:
"receipt from Starbucks"
"invoice dated 2024"
"handwritten phone number"
"screenshot with code"

Advanced Usage

Command Line Indexing

# Index with custom settings
python -m src.indexer.image_indexer --folder "E:\Photos" --batch-size 64

# Force reindex (skip duplicate check)
python -m src.indexer.image_indexer --skip-duplicates False

# Index specific folder
python -m src.indexer.image_indexer --folder "D:\Work\Screenshots"

Programmatic Usage

from src import SearchEngine

# Initialize
engine = SearchEngine(db_path="data/vector_store/images.index")

# Text search
results = engine.search_by_text("sunset", top_k=10)

# Image search
results = engine.search_by_image("reference.jpg", top_k=10)

# Hybrid search
results = engine.search_hybrid(
    query_text="beach",
    query_image="reference.jpg",
    text_weight=0.7,
    image_weight=0.3
)

# Get statistics
stats = engine.get_statistics()
print(f"Total images: {stats['total_images']}")

Troubleshooting

Issue: "No module named 'clip'"

# Solution: Install CLIP
pip install git+https://github.com/openai/CLIP.git

Issue: "Tesseract not found"

# Windows: Add to PATH
C:\Program Files\Tesseract-OCR

# Or specify in code:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Issue: "CUDA out of memory"

# Use CPU instead (in image_embeddings.py):
self.device = "cpu"  # Force CPU

Issue: "Database not found"

# Reindex your images
python -m src.indexer.image_indexer

Issue: "Images not appearing in search"

# Click " Refresh" button in Streamlit sidebar
# Or reindex from command line
python -m src.indexer.image_indexer


🀝 Contributing

Contributions are welcome! Please follow these steps:

# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/amazing-feature

# 3. Commit your changes
git commit -m "Add amazing feature"

# 4. Push to branch
git push origin feature/amazing-feature

# 5. Open a Pull Request

Development Setup

# Install dev dependencies
pip install -r requirements-dev.txt

License

This project is licensed under the MIT License .


Acknowledgments

  • OpenAI CLIP - For the amazing vision-language model
  • FAISS - For efficient vector similarity search
  • Streamlit - For the beautiful web framework
  • Tesseract - For OCR capabilities
  • PIL/Pillow - For image processing

Contact

Ayush Kumar - (https://www.linkedin.com/in/mr-ayush-kumar-004/)

Project Link: https://github.com/Ayushkumar111/neurasnip


If you find this project useful, please consider giving it a star! ⭐


Made with ❀️ and Python

Report Bug β€’ Request Feature

About

Ask you images 🫢🏻

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages