### Automatic Image Captioning with BLIP

This notebook automates the process of generating descriptive captions for a collection of images using the BLIP (Bootstrapped Language-Image Pre-training) model from Hugging Face. BLIP is a powerful model that can generate captions for images, and this notebook makes it easy to use this capability across an entire directory of images.
Key Components of the Code:

    Pre-trained BLIP Model:
        The model used for captioning is Salesforce's BLIP-Image-Captioning-Large, a pre-trained model that generates high-quality captions for images.
        The model is loaded using Hugging Face's transformers library and is moved to the GPU (cuda) for faster processing.

    Directory-Based Image Captioning:
        The script processes all images in a specified input directory (image_directory), generating captions for each image.
        It supports common image formats such as .jpg, .jpeg, and .png.

    Caption Generation Process:
        For each image, the script:
            Loads the image and converts it to RGB format.
            Feeds the image into the BLIP model to generate a caption.
            Decodes the generated caption into text form.

    Saving Captions:
        Captions are saved in individual text files with the same name as the corresponding image file, but with a .txt extension.
        All caption files are stored in the specified output directory (output_directory).

How to Use:

    Input and Output Directory Setup:
        You need to upload your images to a specified folder in Google Drive, which is linked to Colab.
        The output directory is where the generated captions (in .txt files) will be saved.

    Why Use Colab?
        Due to the computational requirements of the BLIP model, especially when using large versions of the model, running this code on a GPU is essential for efficiency. Colab provides easy access to GPU resources and has the necessary Python libraries (like transformers and torch) pre-installed or easy to install.

What the Code Does:

    This script is designed to automate caption generation for batches of images, making it useful for tasks like image dataset annotation, content creation, or even generating alternative text descriptions for web images.

Once the script runs, you'll have a collection of text files, each containing a caption that describes its corresponding image. This process saves time by eliminating the need for manual captioning.

In [None]:
import os
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

# Load BLIP model and processor
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large").to("cuda")

def generate_captions_for_directory(image_dir, output_dir):
    # Ensure output directory exists
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(image_dir):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            # Process the image and generate caption
            image_path = os.path.join(image_dir, filename)
            raw_image = Image.open(image_path).convert('RGB')
            inputs = processor(raw_image, return_tensors="pt").to("cuda")
            out = model.generate(**inputs)
            caption = processor.decode(out[0], skip_special_tokens=True)

            # Save the caption to a text file
            text_file_path = os.path.join(output_dir, os.path.splitext(filename)[0] + ".txt")
            with open(text_file_path, 'w') as f:
                f.write(caption)

# Define input and output directories
image_directory = " "
output_directory = " "

generate_captions_for_directory(image_directory, output_directory)
