#Assignment 1: Submit a write-up on the following:

### 1. Hugging face agents:

It is a LLM helper that does certain tasks.
Examples. Generate images, the completion of sentences.

### 2. Hugging face pipeline for text generation:
Hugging Face offers a pipeline feature that simplifies various tasks like text generation. The process involves:
- **Model Download**: The desired model is downloaded from the Hugging Face.
- **Parameter Setting**: Adjust parameters.
- **Tokenization**: The input text is tokenized to prepare it for the model.
- **Query Answering**: The downloaded model generates a response.

### 2. HF Inference Endpoints
Hugging Face provides inference APIs, allowing users to utilize various models without needing to host them locally.
Examples:
- Sentence Completion
- Question Answering
- Summarization

### 3. Feedback on Image Generation and Exploring Different Models
Hugging Face also supports image generation through LLM Agents.

#Assignment 2: Using OpenAI's CLIP Model for Image Captioning and Building an Image Search Engine

#Objective

##In this assignment, you will use OpenAI's CLIP (Contrastive Language-Image Pre-training) model to:
- Generate captions for 15 different images.
- Build a search engine for these images using a larger dataset of images.


##Part 1: Generate Captions for Images

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [9]:
import os
from PIL import Image
import numpy as np
from matplotlib import pyplot as plt
import spacy
from transformers import BlipProcessor, BlipForConditionalGeneration

In [10]:
!python -m spacy download en_core_web_md
nlp = spacy.load("en_core_web_md")
!pip install transformers

Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [11]:
def read_images(images_folder_path):
  images = []

  for filename in os.listdir(images_folder_path):
    image_path = os.path.join(images_folder_path, filename)
    image = Image.open(image_path)
    images.append(image)

  for image in images:
    plt.show(plt.imshow(np.asarray(image)))

  return images

def generate_captions(images):
  generated_captions = []

  processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
  model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

  for image in images:
    inputs = processor(images=image, return_tensors='pt')
    out = model.generate(**inputs)
    generated_captions.append(processor.batch_decode(out, skip_special_tokens=True)[0])
    return generated_captions

def search_agent(generated_captions, query):
  caprion1 = nlp(query)

  max_similarity = 0
  most_similar_caption = ''

  for caption in generated_captions:
    caption2 = nlp(caption)
    similarity = caption1.similarity(caption2)

    if similarity > max_similarity:
      max_similarity = similarity
      most_similar_caption = caption

    return most_similar_caption

In [12]:
image_folder_path = '/content/drive/MyDrive/Images'

images = []
for filename in os.listdir(image_folder_path):
    image_path = os.path.join(image_folder_path, filename)
    image = Image.open(image_path)
    images.append(image)

for image in images:
    plt.show(plt.imshow(np.asarray(image)))

Output hidden; open in https://colab.research.google.com to view.

In [13]:
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

In [14]:
generated_captions = []
for image in images:
  inputs = processor(images=image, return_tensors="pt")

  out = model.generate(**inputs)
  generated_captions.append(processor.batch_decode(out, skip_special_tokens=True)[0])



In [15]:
generated_captions

['a person holding a handful of strawberries',
 'a hand holding a cone of ice cream',
 'a woman is looking at a book shelf',
 'a monkey eating a banana',
 'a young boy feeding a deer through a fence',
 'a bowl of popcorn on a marble surface',
 'a small white dog running through a field',
 'a man sitting at a table',
 'a dog is playing in a pool with a frc',
 'a man sitting at a table with a laptop and a cell',
 'a large metal tower with a large olympic ring on top',
 'a man in a white uniform',
 'a group of cyclists riding down a road',
 'a young girl is sleeping on a bed',
 'a plate with a piece of las las las las las las las las las las las las las']

In [16]:
import spacy
!python -m spacy download en_core_web_md
nlp = spacy.load("en_core_web_md")

Collecting en-core-web-md==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.7.1/en_core_web_md-3.7.1-py3-none-any.whl (42.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 MB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [17]:
caption1 = nlp("popcorn")
max_similarity = 0
most_similar_caption = ""

for i in generated_captions:
  caption2 = nlp(i)
  similarity = caption1.similarity(caption2)
  if similarity > max_similarity:
    max_similarity = similarity
    most_similar_caption = i

print("Most similar caption:", most_similar_caption)
print("Similarity score:", max_similarity)


Most similar caption: a monkey eating a banana
Similarity score: 0.21598917770107018


In [18]:
caption1 = nlp("sleeper")
max_similarity = 0
most_similar_caption = ""

for i in generated_captions:
  caption2 = nlp(i)
  similarity = caption1.similarity(caption2)
  if similarity > max_similarity:
    max_similarity = similarity
    most_similar_caption = i

print("Most similar caption:", most_similar_caption)
print("Similarity score:", max_similarity)

Most similar caption: a young girl is sleeping on a bed
Similarity score: 0.4757575940243592


In [19]:
caption1 = nlp("eiffel tower")
max_similarity = 0
most_similar_caption = ""

for i in generated_captions:
  caption2 = nlp(i)
  similarity = caption1.similarity(caption2)
  if similarity > max_similarity:
    max_similarity = similarity
    most_similar_caption = i

print("Most similar caption:", most_similar_caption)
print("Similarity score:", max_similarity)

Most similar caption: a large metal tower with a large olympic ring on top
Similarity score: 0.3898588687711469


##Part 2: Build an Image Search Engine


##Submission
Submit the following as a **Streamlit** app:

- Your Python code for generating captions and building the search engine.
- A report describing your approach, challenges faced, and how you overcame them.
- Screenshots of the interface and results.

Evaluation Criteria

- Correctness and efficiency of the code.
- Clarity and completeness of the report.
- Usability and functionality of the search engine interface.

#Please don't use any Generative AI Models