<a href="https://colab.research.google.com/github/codeREXus/langchain-learnings/blob/main/mini_projs/image_captioning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🖼️ Image Captioning with Salesforce BLIP-2

In this mini project, we build an **Image Captioning System** using the **Salesforce BLIP-2 (Bootstrapping Language-Image Pre-training v2)** model.  
BLIP-2 is an advanced vision-language model that connects **images and natural language**, capable of generating high-quality descriptive captions for input images.  

We integrate the model with a simple **Gradio interface** to make it interactive and user-friendly. 🚀

---

## 📌 Project Overview
- Load the **BLIP-2 pre-trained model** (`blip2-opt-2.7b`) from Hugging Face Transformers.
- Process input images and feed them to the model.
- Generate a **natural language caption** describing the image.
- Deploy the model in a **Gradio app** for easy interaction.

---

## 🛠️ Tools & Libraries Used
- **[Transformers](https://huggingface.co/docs/transformers/index)** – For BLIP-2 model & processor  
- **[Torch](https://pytorch.org/)** – Deep learning backend  
- **[Gradio](https://www.gradio.app/)** – Web interface for interaction  



In [3]:
%%capture
!pip install transformers bitsandbytes torch gradio

In [None]:
import gradio as gr
import torch
from transformers import Blip2Processor, Blip2ForConditionalGeneration

# Load processor & model
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b",
    device_map="auto",      # let HF decide placement
    load_in_8bit=True       # quantized mode, saves VRAM
)

def generate_caption(image):
    # Prepare inputs and send to the same device as the model
    inputs = processor(images=image, return_tensors="pt").to(model.device)

    # Generate caption
    outputs = model.generate(**inputs, max_new_tokens=30)
    caption = processor.decode(outputs[0], skip_special_tokens=True)
    return caption

def caption_image(image):
    """
    Takes a PIL Image input and returns a caption.
    """
    try:
        return generate_caption(image)
    except Exception as e:
        return f"An error occurred: {str(e)}"

# Gradio UI
iface = gr.Interface(
    fn=caption_image,
    inputs=gr.Image(type="pil"),
    outputs="text",
    title="Image Captioning with BLIP-2",
    description="Upload an image to generate a caption."
)

# Use 0.0.0.0 for Colab (127.0.0.1 won't work there)
iface.launch(server_name="127.0.0.1")