
#Automatic Image Captioning with BLIP

This notebook automates the process of generating descriptive captions for a collection of images using the BLIP (Bootstrapped Language-Image Pre-training) model from Hugging Face. BLIP is a powerful model that can generate captions for images, and this notebook streamlines the process for an entire directory of images, storing the results in a structured CSV file.
Key Components of the Code:

    Pre-trained BLIP Model:
        The model used for captioning is Salesforce's BLIP-Image-Captioning-Large, a pre-trained model that generates high-quality captions for images.
        The model is loaded using Hugging Face's transformers library and is moved to the GPU (CUDA) for faster processing.

    Directory-Based Image Captioning:
        The script processes all images in a specified input directory (image_directory), generating captions for each image.
        It supports common image formats such as .jpg, .jpeg, and .png.

    Caption Generation Process:
        For each image, the script:
            Loads the image and converts it to RGB format.
            Feeds the image into the BLIP model to generate a caption.
            Decodes the generated caption into text form.

    Saving Captions to CSV:
        Instead of generating separate text files for each image, the captions are saved in a single CSV file (Metadata.csv).
        The CSV file contains two columns: filename (the name of the image) and prompt (the generated caption for the image).
        This approach provides a consolidated view of all image captions, making it easier to manage and analyze the data.

How to Use:

    Input and Output Directory Setup:
        Place your images in the specified folder, and the script will process all valid image files in that directory.
        The output CSV file (Metadata.csv) will be saved in the specified location, containing filenames and their corresponding captions.

Why Use This Script?

    Efficiency:
        The BLIP model requires significant computational power, particularly for large-scale image processing. Using a GPU accelerates the process, making it ideal for environments like Google Colab, which provides easy access to GPU resources.
    Automated Captioning:
        The script automates the task of captioning images in bulk, which is useful for tasks such as annotating image datasets, generating content for media, or creating alternative text descriptions for web images.

Important Note:

    Manual Review of Captions:
        While the model generates high-quality captions, it's recommended to manually review the captions to ensure their accuracy. The captions may sometimes require adjustments, depending on the context and specific use cases.

#Once the script completes, you'll have a Metadata.csv file that stores all the image filenames and their corresponding captions, streamlining the captioning process and providing a structured output for further use.

In [None]:
import os
import csv
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large").to("cuda")

def generate_captions_for_directory(image_dir, output_csv_path):
    data = []

    for filename in os.listdir(image_dir):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):

            image_path = os.path.join(image_dir, filename)
            raw_image = Image.open(image_path).convert('RGB')
            inputs = processor(raw_image, return_tensors="pt").to("cuda")
            out = model.generate(**inputs)
            caption = processor.decode(out[0], skip_special_tokens=True)


            data.append([filename, caption])


    with open(output_csv_path, 'w', newline='', encoding='utf-8') as csvfile:
        csvwriter = csv.writer(csvfile)
        csvwriter.writerow(['filename', 'prompt'])
        csvwriter.writerows(data)

    print(f"Metadata saved to {output_csv_path}")

# Define input directory and output CSV file path
image_directory = " "  # Replace with your image directory path
output_csv = "content/drive/mydrive////Metadata.csv"  # Output CSV file path(actual path where you want to store)

generate_captions_for_directory(image_directory, output_csv)
