# Image Search Engine with CLIP Embeddings - Homework Assignment

![CLIP Architecture](https://github.com/openai/CLIP/raw/main/CLIP.png)

In this homework, you will implement an **Image Search Engine** using CLIP (Contrastive Language-Image Pre-training) embeddings on the Tiny-ImageNet validation dataset. CLIP allows you to search for images using both text queries and image queries.

## 📌 Project Overview
- **Task**: Build a multimodal image search engine
- **Architecture**: Pre-trained CLIP model for feature extraction
- **Dataset**: Tiny-ImageNet validation set
- **Goal**: Retrieve most similar images given text or image queries

## 📚 Learning Objectives
By completing this assignment, you will:
- Understand multimodal embeddings and their applications
- Learn to use pre-trained CLIP models for feature extraction
- Implement similarity search using cosine similarity
- Evaluate zero-shot classification performance
- Build a practical image retrieval system

## 1️⃣ Dataset Setup

**Task**: Download and explore the Tiny-ImageNet validation dataset.

**Requirements**:
- Install the tinyimagenet package
- Load the validation split of the dataset
- Explore the dataset structure and class labels
- Visualize sample images from different classes

In [None]:
# TODO: Install tinyimagenet package (uncomment the line below)
# !pip install tinyimagenet

# TODO: Import necessary libraries

# TODO: Load the validation dataset


# TODO: Print dataset information


# TODO: Display 5 sample images with their class information

## 2️⃣ Import Libraries and Configuration

**Task**: Import all necessary libraries and set up the environment for CLIP.

**Requirements**:
- Import PyTorch, transformers, and other necessary libraries
- Load the pre-trained CLIP model and processor
- Set up device configuration (GPU if available)
- Configure any necessary parameters

In [None]:
# TODO: Import all necessary libraries


# TODO: Check device availability


# TODO: Load pre-trained CLIP model and processor


# TODO: Move model to device


# TODO: Print model information

## 3️⃣ Feature Extraction from Dataset

**Task**: Extract CLIP embeddings for all images in the validation dataset.

**Requirements**:
- Process all validation images through CLIP
- Extract and store image embeddings
- Normalize embeddings for cosine similarity computation
- Save embeddings for efficient searching

In [None]:
# TODO: Create function to extract image embeddings

#
#         # TODO: Collect batch of images

#         # TODO: Process batch through CLIP

#
#         # TODO: Print progress

#     # TODO: Concatenate all embeddings


# TODO: Extract embeddings for the validation dataset


## 4️⃣ Zero-Shot Classification Evaluation

**Task**: Evaluate CLIP's zero-shot classification performance on Tiny-ImageNet.

**Requirements**:
- Create text prompts for each image in Tiny-ImageNet
    - Check the dataset words.txt!!!!
- Extract text embeddings for class descriptions
- Perform zero-shot classification using similarity matching
- Calculate and report accuracy metrics

In [None]:
# TODO: Create text prompts for all classes


# TODO: Extract text embeddings for class prompts


# TODO: Perform zero-shot classification

# TODO: Calculate similarities and predictions

# TODO: Calculate accuracy



## 5️⃣ Image Search Engine Implementation

**Task**: Build functions to search for similar images using both text and image queries.

**Requirements**:
- Implement text-to-image search functionality
- Implement image-to-image search functionality
- Return top-k most similar images
- Create visualization functions for search results

In [None]:
# TODO: Implement text-to-image search


# TODO: Implement image-to-image search


# TODO: Create visualization function


## 6️⃣ Testing with Custom Queries

**Task**: Test your search engine with custom text queries and web images.

**Requirements**:
- Test with 5 different text queries
- Download and test with 5 images from the web
- Display top 5 most similar images for each query
- Analyze the quality of retrieved results

In [None]:
# TODO: Test with text queries

In [None]:
# TODO: Test with image queries from web


# TODO: Define image URLs for testing


## 📝 Evaluation Criteria

Your homework will be evaluated based on:

1. **Implementation Correctness (40%)**
   - Proper CLIP model loading and usage
   - Correct feature extraction for images and text
   - Working search functionality for both text and image queries
   - Accurate zero-shot classification implementation

2. **Search Results Quality (30%)**
   - Reasonable search results for text queries
   - Appropriate image-to-image search results
   - Correct similarity calculations and ranking
   - Zero-shot classification accuracy

3. **Code Quality (20%)**
   - Clean, readable code with proper comments
   - Efficient implementation with batch processing
   - Proper error handling and edge cases
   - Well-structured functions

4. **Testing and Demonstration (10%)**
   - Successful testing with custom queries
   - Clear visualization of search results
   - Proper documentation of testing process