<a href="https://colab.research.google.com/github/Hatim-0101/LLM_Bootcamp/blob/main/Assessment_Week1_editedVersion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Assignment 1: Submit a write-up on the following:

- Hugging face agents

- Hugging face pipeline for text generation

- HF inference endpoints

- Give feedback on the image generation and explore different models available on the Hugging Face website



#Assignment 2: Using OpenAI's CLIP Model for Image Captioning and Building an Image Search Engine

#Objective

##In this assignment, you will use OpenAI's CLIP (Contrastive Language-Image Pre-training) model to:
- Generate captions for 15 different images.
- Build a search engine for these images using a larger dataset of images.


##Part 1: Generate Captions for Images

##Part 2: Build an Image Search Engine


##Submission
Submit the following as a **Streamlit** app:

- Your Python code for generating captions and building the search engine.
- A report describing your approach, challenges faced, and how you overcame them.
- Screenshots of the interface and results.

Evaluation Criteria

- Correctness and efficiency of the code.
- Clarity and completeness of the report.
- Usability and functionality of the search engine interface.

#Please don't use any Generative AI Models

1- Hugging Face agents:
Hugging Face agents are AI tools built on the Hugging Face platform for specific tasks or conversations. They use pre-trained language models, can be customized for various applications, and are deployable via the Hugging Face platform or APIs.

2- The Hugging Face pipeline for text generation simplifies text generation using pre-trained models by abstracting complex steps. It allows users to input prompts and receive generated text output easily, handling tokenization, model inference, and output processing automatically.

3- HF inference endpoints:
Hugging Face inference endpoints are API services that allow users to deploy and run machine learning models in the cloud.

4- Hugging Face hosts various image generation models, such as Stable Diffusion variants, DALL-E, and Midjourney-inspired models, each excelling in areas like photorealism, artistic styles, and prompt adherence. Users can explore model cards to understand each model's capabilities, limitations, and see example outputs, facilitating easy comparison and experimentation.

In [1]:
!pip install gradio

Collecting gradio
  Downloading gradio-4.40.0-py3-none-any.whl.metadata (15 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi (from gradio)
  Downloading fastapi-0.112.0-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.2.0 (from gradio)
  Downloading gradio_client-1.2.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.9 (from gradi

In [7]:
from transformers import CLIPProcessor, CLIPModel
from PIL import Image, UnidentifiedImageError
import requests
import io
import torch
import gradio as gr

model = CLIPModel.from_pretrained('openai/clip-vit-base-patch32')
processor = CLIPProcessor.from_pretrained('openai/clip-vit-base-patch32')

urls = [
    "https://images.pexels.com/photos/27453216/pexels-photo-27453216/free-photo-of-two-people-walking-in-the-ocean-at-sunset.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/27377810/pexels-photo-27377810/free-photo-of-girl-silhouette-at-taj-mahal-india.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/27305400/pexels-photo-27305400/free-photo-of-coconut-oil-is-a-great-way-to-get-your-body-healthy.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/27409345/pexels-photo-27409345/free-photo-of-peaceful-evening.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/15968922/pexels-photo-15968922/free-photo-of-woman-posing-under-tree-in-spring.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/27420720/pexels-photo-27420720/free-photo-of-delhi-metro-subway-platform.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/20141600/pexels-photo-20141600/free-photo-of-photo-of-cups-standing-around-the-sink-by-the-window-in-a-kitchen.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/19190850/pexels-photo-19190850/free-photo-of-street-market-in-morocco.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/13804796/pexels-photo-13804796.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/27215761/pexels-photo-27215761/free-photo-of-bramble.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/27402099/pexels-photo-27402099/free-photo-of-the-white-domes-of-a-mosque-are-seen-in-the-distance.jpeg?auto=compress&cs=tinysrgb&w=600&lazy=load",
    "https://images.pexels.com/photos/40465/pexels-photo-40465.jpeg?auto=compress&cs=tinysrgb&w=600",
    "https://images.pexels.com/photos/2102367/pexels-photo-2102367.jpeg?auto=compress&cs=tinysrgb&w=600",
    "https://images.pexels.com/photos/747079/pexels-photo-747079.jpeg?auto=compress&cs=tinysrgb&w=600",
]

# download and open images
images = []
for url in urls:
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        image_data = io.BytesIO(response.content)
        try:
            image = Image.open(image_data).convert("RGB")
            images.append(image)
        except UnidentifiedImageError:
            print(f"Failed to open image from {url}")
    else:
        print(f"Failed to download image from {url}")

image_inputs = processor(images=images, return_tensors="pt", padding=True)

# image embeddings to text
with torch.no_grad():
    image_features = model.get_image_features(**image_inputs)
    image_features /= image_features.norm(p=2, dim=-1, keepdim=True)

def search_images(query):
    # preprocess the query
    inputs = processor(text=[query], return_tensors="pt", padding=True)

    # query embedding
    with torch.no_grad():
        query_features = model.get_text_features(**inputs)
        query_features /= query_features.norm(p=2, dim=-1, keepdim=True)

    # get the similarity between query and images
    similarity = torch.matmul(query_features, image_features.T).squeeze()

    # get the top 3 most similar images
    top_k = similarity.topk(3).indices.tolist()

    return [images[i] for i in top_k]

# Gradio interface
interface = gr.Interface(
    fn=search_images,
    inputs=gr.Textbox(label="Enter your query"),
    outputs=[gr.Image(type="pil", label="Most relevant image 1"),
             gr.Image(type="pil", label="Most relevant image 2"),
             gr.Image(type="pil", label="Most relevant image 3")],
)

interface.launch()


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://ee841e1188572b85ec.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
# Streamlit interface
st.title("Image Search Engine")
user_input = st.text_input("Enter a description:")

if user_input:
    # Generate captions
    captions = [user_input]

    # Process inputs with CLIP
    inputs = processor(text=captions, images=images, return_tensors='pt', padding=True)
    outputs = model(**inputs)
    probs = outputs.logits_per_image.argmax(dim=1)

    # Display the best matching image
    best_match_idx = probs[0].item()
    st.image(images[best_match_idx], caption=captions[0])

    # Show all images with their scores (optional)
    st.write("All images and their scores:")
    for i, image in enumerate(images):
        score = outputs.logits_per_image[0, i].item()
        st.image(image, caption=f"Score: {score:.4f}")
!streamlit run /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.125.192.182:8501[0m
[0m
