A powerful image search application that uses OpenAI's CLIP (Contrastive Language-Image Pre-training) model to enable semantic image search using natural language queries. Search through your image collection by describing what you're looking for in plain English.
- Semantic Image Search: Search images using natural language descriptions
- CLIP-Powered: Uses OpenAI's CLIP-ViT-Large model for accurate image-text matching
- Vector Database: ChromaDB for efficient similarity search
- RESTful API: FastAPI backend with automatic image indexing
- Modern UI: Streamlit-based web interface for easy interaction
- Automatic Indexing: Images are automatically indexed on first startup
- Similarity Scores: View how well each result matches your query
- Backend: FastAPI
- Frontend: Streamlit
- ML Model: OpenAI CLIP (ViT-Large-Patch14-336)
- Vector Database: ChromaDB
- Deep Learning: PyTorch, Transformers
- Image Processing: Pillow (PIL)
- Python 3.8 or higher
- pip (Python package manager)
- Virtual environment (recommended)
-
Clone the repository (or navigate to the project directory):
cd clip-transformer -
Create a virtual environment (if not already created):
python -m venv venv
-
Activate the virtual environment:
# On macOS/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
clip-transformer/
├── api.py # FastAPI backend server
├── clip.py # CLIP model functions (indexing, search)
├── ui.py # Streamlit frontend interface
├── requirements.txt # Python dependencies
├── images/ # Directory containing images to search
├── chroma_db/ # ChromaDB database (auto-generated)
└── README.md # This file
Place all images you want to search in the images/ directory. Supported formats:
.jpg/.jpeg.png
In Terminal 1, start the FastAPI backend:
# Activate virtual environment (if not already activated)
source venv/bin/activate
# Start the server
uvicorn api:app --reloadThe API will be available at: http://localhost:8000
Note: On first startup, the server will automatically:
- Load the CLIP model (this may take a few minutes)
- Check if ChromaDB collection is empty
- Index all images in the
images/folder if needed
In Terminal 2, start the Streamlit interface:
# Activate virtual environment (if not already activated)
source venv/bin/activate
# Start Streamlit
streamlit run ui.pyThe UI will automatically open in your browser at: http://localhost:8501
- Enter a text description in the search box (e.g., "a cat", "people playing football", "sunset over mountains")
- Adjust the number of results using the slider (1-10)
- Click "Search"
- View the matching images with similarity scores
Health check endpoint.
Response:
{
"message": "Welcome to the image search API"
}Search for images based on text query.
Request Body:
{
"query": "a photo of a cat",
"n_results": 5
}Response:
{
"query": "a photo of a cat",
"results": [
{
"id": "0",
"filename": "cat_image.jpg",
"similarity": 0.8234
},
{
"id": "1",
"filename": "another_cat.jpg",
"similarity": 0.7891
}
]
}Check API health status.
Response:
{
"status": "healthy"
}Once the FastAPI server is running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
-
Image Indexing:
- Images are processed through the CLIP model's image encoder
- Each image is converted to a 768-dimensional embedding vector
- Embeddings are stored in ChromaDB with metadata (filename)
-
Text Query Processing:
- User's text query is processed through CLIP's text encoder
- Text is converted to the same 768-dimensional embedding space
-
Similarity Search:
- ChromaDB performs cosine similarity search
- Returns top-k most similar images based on embedding distance
-
Results Display:
- Images are displayed with similarity scores
- Higher similarity scores indicate better matches
initialize_model(): Loads CLIP model and processorinitialize_chromadb(): Connects to ChromaDBindex_images(): Processes and indexes imagessearch_images(): Searches for images based on text query
- FastAPI application setup
- CORS middleware configuration
- Startup event handler (auto-indexing)
- API endpoints for search operations
- Streamlit web interface
- Search input and results display
- Image rendering with similarity scores
Anubhav Mandarwal (Anubhav Mandarwal)