Skip to content

VrityaCodeRishi/Clip-Image-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLIP Image Search

A powerful image search application that uses OpenAI's CLIP (Contrastive Language-Image Pre-training) model to enable semantic image search using natural language queries. Search through your image collection by describing what you're looking for in plain English.

Features

  • Semantic Image Search: Search images using natural language descriptions
  • CLIP-Powered: Uses OpenAI's CLIP-ViT-Large model for accurate image-text matching
  • Vector Database: ChromaDB for efficient similarity search
  • RESTful API: FastAPI backend with automatic image indexing
  • Modern UI: Streamlit-based web interface for easy interaction
  • Automatic Indexing: Images are automatically indexed on first startup
  • Similarity Scores: View how well each result matches your query

Tech Stack

  • Backend: FastAPI
  • Frontend: Streamlit
  • ML Model: OpenAI CLIP (ViT-Large-Patch14-336)
  • Vector Database: ChromaDB
  • Deep Learning: PyTorch, Transformers
  • Image Processing: Pillow (PIL)

Prerequisites

  • Python 3.8 or higher
  • pip (Python package manager)
  • Virtual environment (recommended)

Installation

  1. Clone the repository (or navigate to the project directory):

    cd clip-transformer
  2. Create a virtual environment (if not already created):

    python -m venv venv
  3. Activate the virtual environment:

    # On macOS/Linux:
    source venv/bin/activate
    
    # On Windows:
    venv\Scripts\activate
  4. Install dependencies:

    pip install -r requirements.txt

Project Structure

clip-transformer/
├── api.py              # FastAPI backend server
├── clip.py             # CLIP model functions (indexing, search)
├── ui.py               # Streamlit frontend interface
├── requirements.txt    # Python dependencies
├── images/             # Directory containing images to search
├── chroma_db/          # ChromaDB database (auto-generated)
└── README.md           # This file

Usage

1. Prepare Your Images

Place all images you want to search in the images/ directory. Supported formats:

  • .jpg / .jpeg
  • .png

2. Start the Backend Server

In Terminal 1, start the FastAPI backend:

# Activate virtual environment (if not already activated)
source venv/bin/activate

# Start the server
uvicorn api:app --reload

The API will be available at: http://localhost:8000

Note: On first startup, the server will automatically:

  • Load the CLIP model (this may take a few minutes)
  • Check if ChromaDB collection is empty
  • Index all images in the images/ folder if needed

3. Start the Frontend UI

In Terminal 2, start the Streamlit interface:

# Activate virtual environment (if not already activated)
source venv/bin/activate

# Start Streamlit
streamlit run ui.py

The UI will automatically open in your browser at: http://localhost:8501

4. Search Images

  1. Enter a text description in the search box (e.g., "a cat", "people playing football", "sunset over mountains")
  2. Adjust the number of results using the slider (1-10)
  3. Click "Search"
  4. View the matching images with similarity scores

API Documentation

Endpoints

GET /

Health check endpoint.

Response:

{
  "message": "Welcome to the image search API"
}

POST /search

Search for images based on text query.

Request Body:

{
  "query": "a photo of a cat",
  "n_results": 5
}

Response:

{
  "query": "a photo of a cat",
  "results": [
    {
      "id": "0",
      "filename": "cat_image.jpg",
      "similarity": 0.8234
    },
    {
      "id": "1",
      "filename": "another_cat.jpg",
      "similarity": 0.7891
    }
  ]
}

GET /health

Check API health status.

Response:

{
  "status": "healthy"
}

Interactive API Documentation

Once the FastAPI server is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

How It Works

  1. Image Indexing:

    • Images are processed through the CLIP model's image encoder
    • Each image is converted to a 768-dimensional embedding vector
    • Embeddings are stored in ChromaDB with metadata (filename)
  2. Text Query Processing:

    • User's text query is processed through CLIP's text encoder
    • Text is converted to the same 768-dimensional embedding space
  3. Similarity Search:

    • ChromaDB performs cosine similarity search
    • Returns top-k most similar images based on embedding distance
  4. Results Display:

    • Images are displayed with similarity scores
    • Higher similarity scores indicate better matches

Code Structure

clip.py

  • initialize_model(): Loads CLIP model and processor
  • initialize_chromadb(): Connects to ChromaDB
  • index_images(): Processes and indexes images
  • search_images(): Searches for images based on text query

api.py

  • FastAPI application setup
  • CORS middleware configuration
  • Startup event handler (auto-indexing)
  • API endpoints for search operations

ui.py

  • Streamlit web interface
  • Search input and results display
  • Image rendering with similarity scores

Author

Anubhav Mandarwal (Anubhav Mandarwal)

About

This project involves using clip transformer model to search images within collection of images using text as description. Clip transformers uses Vector embeddings and searching by mapping text and image in same vector space.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages