# Image-Based Product Description Generator

This Python script generates **marketing-oriented product descriptions** based on visual input. It utilizes a **Vision-to-Text model** called `SmolVLM-Instruct`, provided by Hugging Face. The model takes an image and, based on a specific textual prompt, produces a relevant marketing description for the image.

The script is optimized for **Google Colab** and designed to run efficiently with **TPU v4**, allowing fast processing of large datasets and rapid generation of image-based descriptions.

---

## **Key Features**

* **Model**: `SmolVLM-Instruct`, a vision-language model capable of interpreting both images and text.
* **Objective**: Generate compelling, e-commerce-friendly marketing descriptions for each image.
* **Outputs**:

  * A `.txt` file containing detailed product descriptions.
  * A `.csv` file listing image names alongside their generated descriptions.
* **Google Colab & TPU v4**: Leveraging TPU significantly enhances speed and scalability when working with multiple images.

---

# **How to Run the Script in Google Colab**

### 1. **Open Google Colab**

Visit [Google Colab](https://colab.research.google.com) and start a new Python 3 notebook.

### 2. **Enable TPU v4**

To enable TPU:

* Click **Runtime** → **Change runtime type**
* Set **Hardware accelerator** to **TPU**

### 3. **Install Required Libraries**

Run the following commands in a code cell to install dependencies:

```python
!pip install torch torchvision
!pip install transformers
!pip install Pillow
```

### 4. **Upload and Run the Script**

Upload your Python script into the Colab environment. Paste the code into a cell and run it. The script will:

* Load images from the specified folder,
* Generate a description for each,
* Save the results into `.txt` and `.csv` files.

---




# **Code Overview — In-Depth Explanation**

---

## **1. Importing Required Libraries**

```python
import torch
import csv
import os
import re
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image
```

### 🔍 Explanation:

* **`torch`**: The PyTorch library is used for tensor computation and to manage model operations, including device placement (CPU/GPU/TPU).
* **`csv`**: Built-in Python module to create and write structured `.csv` files — essential for exporting data in tabular format.
* **`os`**: Provides functions for interacting with the operating system — used here to read directory contents and manipulate file paths.
* **`re`**: Regular expressions are used for pattern-based text manipulation — in this case, to clean up and parse model outputs.
* **`PIL (Image)`**: From the `Pillow` library, allows loading and processing of image files.
* **`transformers`**: Hugging Face’s library for working with pre-trained language and vision models:

  * **`AutoProcessor`**: Automatically loads the appropriate pre-processing pipeline (image and text).
  * **`AutoModelForVision2Seq`**: Loads a model that can take an image and output text (Vision-to-Text).
  * **`load_image`**: A utility to properly load and convert image files into a format the model can accept.

---

In [None]:
import torch
import csv
import os
import re
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image

## **2. Device Selection**

```python
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
```

### Explanation:

* This line dynamically assigns the model to the most suitable processing hardware:

  * **"cuda"** for NVIDIA GPUs,
  * **"cpu"** if no GPU is available.
* This ensures optimal performance without hardcoding the device.
* If you’re running the notebook on **TPU** (as in Google Colab), additional setup is required, but this fallback handles most local and cloud setups.

---

## **3. Load Processor & Model**

```python
processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-Instruct")
model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceTB/SmolVLM-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    use_flash_attention_2=False
)
```

### Explanation:

* Loads the pre-trained model **`SmolVLM-Instruct`**, which is designed for image-to-text tasks.
* The **processor** prepares the input data by:

  * Formatting text and image inputs,
  * Applying tokenization and image transformation,
  * Creating input tensors for the model.
* Model parameters:

  * **`torch_dtype=torch.bfloat16`**: Uses efficient 16-bit precision (bfloat16) for memory optimization on supported hardware.
  * **`device_map="auto"`**: Automatically maps model layers to available devices (e.g., Colab TPUs or GPUs).
  * **`use_flash_attention_2=False`**: Disables Flash Attention, which can offer speed benefits on specific setups, but may be incompatible in Colab TPU contexts.

---

In [None]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-Instruct")

model = AutoModelForVision2Seq.from_pretrained(
    "HuggingFaceTB/SmolVLM-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    use_flash_attention_2=False
)

## **4. Load Image Files**

```python
folder_path = "/content"
allowed_extensions = (".jpg", ".jpeg", ".png")
image_paths = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.lower().endswith(allowed_extensions)]
```

### Explanation:

* **`folder_path`**: Sets the directory where your images are located (e.g., Colab `/content`).
* **`allowed_extensions`**: Filters acceptable image formats to avoid incompatible or unsupported file types.
* **`image_paths`**: Constructs a list of full file paths to images that match the allowed extensions.

  * This list will later be iterated over to generate descriptions.

---

In [None]:
folder_path = "/content"
allowed_extensions = (".jpg", ".jpeg", ".png")
image_paths = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.lower().endswith(allowed_extensions)]

## **5. Prompt for Description Generation**

```python
prompt_text = "Based on this product image, write a persuasive, marketing-focused description for an e-commerce website. Highlight the product’s features, benefits, and the value it offers to the user."
```

### Explanation:

* This is the **instruction given to the model**. It's crafted to align with e-commerce goals:

  * **Persuasion**: Encourages the model to write with marketing language.
  * **Structure**: Prompts inclusion of features, benefits, and user value.
* Customizing this prompt allows different types of descriptions, e.g., technical specs, social media captions, or SEO-focused summaries.

---

## **6. Prepare List for Outputs**

```python
descriptions = []
```

### Explanation:

* Initializes an empty list to store the results.
* Each entry will be a **tuple**: `(image_filename, generated_description)`
* This list is later used to write both `.txt` and `.csv` files.

---

In [None]:
prompt_text = "Based on this product image, write a persuasive, marketing-focused description for an e-commerce website. Highlight the product’s features, benefits, and the value it offers to the user."

descriptions = []



## **7–11. Generate Descriptions from Images**

```python
for idx, image_path in enumerate(image_paths):
    try:
        image = load_image(image_path)
        image_name = os.path.basename(image_path)

        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image},
                    {"type": "text", "text": prompt_text}
                ]
            }
        ]

        prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
        inputs = processor(text=prompt, images=[image], return_tensors="pt").to(DEVICE)

        generated_ids = model.generate(
            **inputs,
            max_new_tokens=300,
            repetition_penalty=1.2
        )

        raw_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
        generated_text = re.split(r"Assistant:\s*", raw_output, maxsplit=1)[-1].strip()

        descriptions.append((image_name, generated_text))

    except Exception as e:
        print(f"Error processing {image_path}: {e}")
```

### Explanation:

* **Image Loading**: Uses `load_image` to convert the file into a format readable by the model.
* **Message Template**: Follows a conversational format (like a chat with an AI assistant).
* **Prompt Generation**: The `apply_chat_template` function wraps your message into a structure the model expects (especially for instruction-tuned models).
* **Input Preparation**: Inputs are tokenized, converted to tensors, and sent to the appropriate device.
* **Text Generation**:

  * `generate()` produces the model’s response.
  * `max_new_tokens=300` limits the length of the generated description.
  * `repetition_penalty=1.2` reduces repeated words, improving diversity and readability.
* **Output Cleaning**:

  * Removes any unwanted prefixes (e.g., “Assistant:”) from the output.
  * Uses regex and `.strip()` to clean whitespace.
* **Appending to List**:

  * Each image’s name and its corresponding generated description are saved for later export.
* **Error Handling**:

  * If an image is corrupted or the model fails to process it, the error is caught and printed, allowing the loop to continue.

---

In [None]:

for idx, image_path in enumerate(image_paths):
    try:
        image = load_image(image_path)
        image_name = os.path.basename(image_path)


        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image},
                    {"type": "text", "text": prompt_text}
                ]
            }
        ]


        prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
        inputs = processor(text=prompt, images=[image], return_tensors="pt").to(DEVICE)


        generated_ids = model.generate(
            **inputs,
            max_new_tokens=300,
            repetition_penalty=1.2
        )


        raw_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
        generated_text = re.split(r"Assistant:\s*", raw_output, maxsplit=1)[-1].strip()

        descriptions.append((image_name, generated_text))

    except Exception as e:
        print(f"Error processing {image_path}: {e}")


## **12. Save to Text File**

```python
with open("product_descriptions.txt", "w") as f:
    for image_name, description in descriptions:
        f.write(f"Image: {image_name}\nDescription: {description}\n\n")
```

### Explanation:

* Exports the descriptions in **plain text** format for easy readability or review.
* Each entry includes:

  * The image filename.
  * The generated product description.
* Useful for manual QA or copying into documents/emails.

---

## **13. Save to CSV File**

```python
with open("product_descriptions.csv", mode="w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Image", "Description"])
    writer.writerows(descriptions)
```

### Explanation:

* Exports the results in a **CSV format** for structured data use.
* Each row contains:

  * Column 1: Image file name.
  * Column 2: Generated description.
* Ideal for importing into:

  * E-commerce CMS platforms,
  * Excel/Google Sheets,
  * Databases or data pipelines.

---

In [None]:
with open("product_descriptions.txt", "w", encoding="utf-8") as txt_file:
    for i, (img_name, desc) in enumerate(descriptions, 1):
        txt_file.write(f"--- Product {i} ({img_name}) ---\n{desc}\n\n")


with open("product_descriptions.csv", "w", encoding="utf-8", newline="") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["Image Name", "Description"])
    writer.writerows(descriptions)

print("\n Descriptions saved to 'product_descriptions.txt' and 'product_descriptions.csv'")



 Descriptions saved to 'product_descriptions.txt' and 'product_descriptions.csv'


# **Results**

By running this Python script, you will automatically generate and save marketing-focused product descriptions for all supported image files.

## Output Files:

* **`product_descriptions.txt`**
  Contains each image’s name and its generated description in plain text format.

* **`product_descriptions.csv`**
  Structured data file with image names and their corresponding descriptions. Ideal for import into databases or e-commerce backends.

---

# **Use Cases**

* **E-Commerce Platforms**: Use these descriptions as product copy for your listings.
* **Marketing Teams**: Generate visual-based text for ads, banners, or campaigns.
* **Data Science / ML**: Use the descriptions for supervised learning or text analysis.

---

# **Important Notes**

* This script runs **efficiently on Google Colab with TPU v4**, which is highly recommended for large image batches.
* Only images with `.jpg`, `.jpeg`, or `.png` extensions in the specified folder will be processed.
* Any corrupted or unsupported image will be skipped and logged in the output.
