Search by meaning, not metadata
Halo is a multimodal Retrieval-Augmented Generation (RAG) system for semantic photo search with automatic album generation. Users can search their photo library by describing vibes and moods, or let AI automatically organize photos into intelligent albums.
- CLIP-based embeddings for images, text queries, and BLIP-generated captions
- Optional BLIP captioning and hybrid scoring for better vibe/mood recall
- LLM-powered query expansion plus optional explanation mode for search hits
- Search-by-example: upload a reference photo and retrieve visually similar shots
- Metadata filters (date range + GPS bounding box) to narrow the search space
- Local-only vector store (ChromaDB) for privacy-preserving retrieval
- AI-powered clustering: Automatically organizes photos into meaningful albums
- Three clustering methods:
- Visual: Groups photos by appearance similarity using K-means on CLIP embeddings
- Temporal: Groups photos by time periods based on capture dates
- Hybrid: Combines visual similarity + timestamps + GPS location for intelligent grouping
- LLM-generated titles: Creative album names and descriptions powered by Gemini/GPT
- Customizable parameters: Adjust target number of albums and minimum photos per album
- Persistent storage: Albums save to JSON and reload on app restart
- Intelligent story-telling: Leveraged Gemini API to deliver intelligent and relevant stories for auto-generated photo albums
# Clone and navigate to project
git clone https://github.com/PeterMcMaster/Halo.git
cd Halo
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install the project in editable mode so `halo` is on your Python path
pip install -e .Create a .env file (or copy .env.example) and set your preferred LLM provider:
For Gemini (Free tier available):
LLM_PROVIDER=gemini
GEMINI_API_KEY=your-google-ai-studio-key
GEMINI_MODEL=gemini-1.5-flashFor OpenAI:
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-key
OPENAI_MODEL=gpt-4o-miniGet API keys:
streamlit run src/halo/ui.pyThe app will open in your browser at http://localhost:8501
- Go to the "Index Photos" section in the sidebar
- Enter the path to your photo folder
- Toggle "Generate BLIP captions" (recommended for better search)
- Click "Index Photos"
- Wait for completion (~1-2 seconds per photo with BLIP)
Text Search:
- Navigate to "Text Search" tab
- Enter a description: "moody nighttime cityscapes", "cozy indoor warm lighting"
- Toggle "LLM query expansion" for richer descriptions
- Optional: Apply date range or GPS filters
- Click "Run text search"
Search by Example:
- Navigate to "Search by Example" tab
- Upload a reference photo
- Click "Find similar"
- View visually similar images with similarity scores
- Go to the "Albums" tab
- Choose Clustering Method:
- Visual: Groups similar-looking photos (beaches, mountains, portraits)
- Temporal: Groups by time periods (trips, events, seasons)
- Hybrid: Smart grouping using visual + temporal + location data (recommended)
- Set Target # of Albums (2-10)
- Set Min Photos/Album (2-10)
- Click "🎬 Generate Albums"
- Browse generated albums with AI-generated titles
Album Features:
- Each album has a creative title and description
- Photos displayed in grid layout
- Albums automatically saved to
photos/albums.json - Load previously generated albums with "Load Saved Albums"
The Streamlit UI can load a custom React results grid. Build once and Streamlit will serve the static assets:
cd react_components/result-grid
npm install
npm run buildFor live development, run the dev server and point Streamlit to it:
npm run dev # at react_components/result-grid (defaults to http://localhost:5173)
export RESULT_GRID_DEV_URL=http://localhost:5173
streamlit run src/halo/ui.pyIf RESULT_GRID_DEV_URL is unset, Streamlit will load the built bundle from react_components/result-grid/dist.
1. Feature Extraction
- Each photo is represented as a 512-dimensional CLIP embedding vector
- Optional temporal features from EXIF timestamps
- Optional spatial features from GPS coordinates
2. Clustering Algorithms
Visual Clustering:
- Uses K-means algorithm on CLIP embeddings
- Groups photos with similar visual content
- Ideal for collections with distinct visual themes
Temporal Clustering:
- Extracts capture timestamps from EXIF metadata
- Groups photos taken within similar time periods
- Simple time-based binning approach
- Ideal for organizing by trips and events
Hybrid Clustering:
- Combines multiple features into unified feature space:
- CLIP embeddings (70% weight)
- Normalized timestamp (15% weight)
- GPS coordinates (15% weight)
- Uses K-means on standardized combined features
- Most intelligent method for general photo libraries
3. LLM-Powered Naming
- Analyzes cluster metadata (dates, locations, photo count)
- Generates creative album titles (max 6 words)
- Creates descriptive 2-3 sentence summaries
- Falls back to generic names if LLM unavailable
4. Persistence
- Albums saved as JSON to
photos/albums.json - Includes all metadata: titles, descriptions, photo paths
- Reloadable across sessions
| Method | Best For | Algorithm | Features Used |
|---|---|---|---|
| Visual | Similar-looking photos | K-means | CLIP embeddings only |
| Temporal | Events & trips | Time binning | Timestamps only |
| Hybrid | Smart organization | K-means | CLIP + Time + GPS |
Hybrid clustering is recommended as it produces the most meaningful albums by considering both visual content and contextual metadata.
If you need quick test photos:
source venv/bin/activate
# Download 40 sample photos from Picsum
python scripts/download_sample_photos.py --clean --limit 40
# Run smoke test
python scripts/smoke_test.py --folder photos/sample_datasetScript Options:
--limit: Number of images (1-100)--width/--height: Image dimensions--clean: Remove old photos first--no-expand: Skip LLM query expansion
Results written to smoke_results.json
See notebooks/evaluation.ipynb for:
- Ablation studies (CLIP-only vs hybrid scoring)
- Latency measurements across collection sizes
- UMAP visualization of embedding space
- Qualitative assessments for reports
Launch via Jupyter/VS Code after activating the virtual environment:
jupyter notebook notebooks/evaluation.ipynbCollection Size | Albums Generated | Processing Time
----------------|------------------|----------------
40 photos | 3-5 albums | ~10 seconds
100 photos | 5-10 albums | ~20 seconds
500 photos | 15-25 albums | ~60 seconds
Processing time includes clustering and LLM name generation
torch,torchvision- PyTorch for neural networkstransformers- Hugging Face models (CLIP, BLIP)chromadb- Vector database for embeddingspillow- Image processingstreamlit- Web UI framework
openai- OpenAI API (GPT models)google-generativeai- Google Gemini API
numpy- Numerical computingscikit-learn- Clustering algorithms (K-means, StandardScaler)umap-learn- Dimensionality reduction for visualizationmatplotlib- Plotting and visualization
python-dotenv- Environment variable managementexifread- Extract EXIF metadata from photostqdm- Progress bars
See requirements.txt for complete dependency list with versions.