<a href="https://colab.research.google.com/github/IshuSinghSE/notebook/blob/master/blip_image_captioning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Step 1: Force reinstall of compatible package versions
!pip install pip-autoremove
!pip-autoremove jax -y
!pip-autoremove numpy -y
!pip install transformers torch pillow numpy==1.23.5 jax jaxlib

jax is not an installed pip module, skipping
numpy 1.23.5 is installed but numpy>=1.24.4 is required
Redoing requirement with just package name...
opencv-python-headless 4.5.5.64 is installed but opencv-python-headless>=4.9.0.80 is required
Redoing requirement with just package name...
numpy 1.23.5 is installed but numpy>=1.24.4 is required
Redoing requirement with just package name...
opencv-python-headless 4.5.5.64 is installed but opencv-python-headless>=4.9.0.80 is required
Redoing requirement with just package name...
numpy 1.23.5 is installed but numpy>=1.26.0 is required
Redoing requirement with just package name...
numpy 1.23.5 is installed but numpy>=1.24.0 is required
Redoing requirement with just package name...
numpy 1.23.5 is installed but numpy>=1.26 is required
Redoing requirement with just package name...
The 'jax>=0.4.27' distribution was not found and is required by the application
Skipping jax
numpy 1.23.5 is installed but numpy>=1.24.1 is required
Redoing requiremen

In [26]:
# Step 2: Import libraries and load the BLIP model
import torch
from PIL import Image
from io import BytesIO
from google.colab import files
import re
from transformers import BlipProcessor, BlipForConditionalGeneration, pipeline

# Set up the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the BLIP model and processor from Hugging Face
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
blip_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(device)



In [27]:
# Step 3: Upload an image and generate the initial caption
print("Please upload an image file.")
uploaded = files.upload()

# Process the uploaded image
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

  # Open the image
  raw_image = Image.open(BytesIO(uploaded[fn])).convert("RGB")

  # Generate the initial caption using the transformers model
  inputs = processor(images=raw_image, return_tensors="pt").to(device)
  out = blip_model.generate(**inputs, max_new_tokens=50)
  initial_caption = processor.decode(out[0], skip_special_tokens=True)

  print("\nGenerated Caption:")
  print(initial_caption)

Please upload an image file.


In [15]:
# List available Gemini models to find a valid model name
import google.generativeai as genai

genai.configure(api_key="GEMINI_API_KEY")

for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-2.5-pro-preview-03-25
models/gemini-2.5-flash-preview-05-20
models/gemini-2.5-flash
models/gemini-2.5-flash-lite-preview-06-17
models/gemini-2.5-pro-preview-05-06
models/gemini-2.5-pro-preview-06-05
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-preview-image-generation
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-exp-1206
models/gemini-2.0-flash-thinking-exp-01-21
models/gemin

## Upload and unzip images

### Subtask:
Upload and unzip images


In [21]:
import zipfile
import os
from google.colab import files
from io import BytesIO

# Upload the zip file
print("Please upload a zip file containing your images.")
uploaded_zip = files.upload()

# Define the target directory for unzipping
unzip_dir = 'unzipped_images'

# Create the directory if it doesn't exist
if not os.path.exists(unzip_dir):
    os.makedirs(unzip_dir)

# Unzip the file
for zip_fn in uploaded_zip.keys():
    with zipfile.ZipFile(BytesIO(uploaded_zip[zip_fn]), 'r') as zip_ref:
        zip_ref.extractall(unzip_dir)
    print(f"Unzipped '{zip_fn}' to '{unzip_dir}'")

Please upload a zip file containing your images.


Saving mage.space.zip to mage.space.zip
Unzipped 'mage.space.zip' to 'unzipped_images'


In [32]:
import pandas as pd
import os
from PIL import Image

# Directory containing the unzipped images
image_dir = 'unzipped_images/mage.space'

# List to store the results
results = []

# Get a sorted list of image files
image_files = sorted([f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png', '.jpeg'))])

print(f"Found {len(image_files)} images to process...")

# Loop through each image file
for image_file in image_files:
    image_path = os.path.join(image_dir, image_file)

    try:
        # Open the image
        raw_image = Image.open(image_path).convert("RGB")

        # Generate the initial caption using the transformers model
        inputs = processor(images=raw_image, return_tensors="pt").to(device)
        out = blip_model.generate(**inputs, max_new_tokens=50)
        caption = processor.decode(out[0], skip_special_tokens=True)

        # Get the filename without the extension
        filename_no_ext = os.path.splitext(image_file)[0]

        # Append the result
        results.append({'filename': filename_no_ext, 'caption': caption})

        print(f"Processed: {image_file}")

    except Exception as e:
        print(f"Could not process {image_file}. Error: {e}")


# Create a pandas DataFrame
df = pd.DataFrame(results)

# Save the DataFrame to a CSV file
csv_filename = 'captions.csv'
df.to_csv(csv_filename, index=False)

print(f"\nSuccessfully generated captions and saved them to '{csv_filename}'")
display(df.head())

Found 153 images to process...
Processed: 001.jpg
Processed: 002.jpg
Processed: 003.jpg
Processed: 004.jpg
Processed: 005.jpg
Processed: 006.jpg
Processed: 007.jpg
Processed: 008.jpg
Processed: 009.jpg
Processed: 010.jpg
Processed: 011.jpg
Processed: 012.jpg
Processed: 013.jpg
Processed: 014.jpg
Processed: 015.jpg
Processed: 016.jpg
Processed: 017.jpg
Processed: 018.jpg
Processed: 019.jpg
Processed: 020.jpg
Processed: 021.jpg
Processed: 022.jpg
Processed: 023.jpg
Processed: 024.jpg
Processed: 025.jpg
Processed: 026.jpg
Processed: 027.jpg
Processed: 028.jpg
Processed: 029.jpg
Processed: 030.jpg
Processed: 031.jpg
Processed: 032.jpg
Processed: 033.jpg
Processed: 034.jpg
Processed: 035.jpg
Processed: 036.jpg
Processed: 037.jpg
Processed: 038.jpg
Processed: 039.jpg
Processed: 040.jpg
Processed: 041.jpg
Processed: 042.jpg
Processed: 043.jpg
Processed: 044.jpg
Processed: 045.jpg
Processed: 046.jpg
Processed: 047.jpg
Processed: 048.jpg
Processed: 049.jpg
Processed: 050.jpg
Processed: 051.jpg


Unnamed: 0,filename,caption
0,1,a bird flying over a colorful background
1,2,a painting of a black and orange abstract
2,3,a boat floating in a body of water
3,4,a close up of a cell with a colorful background
4,5,abstract neon background with different colors




# Task
Generate professional, two-word titles, plain text descriptions, tags, and categories for a collection of 153 images (named `001.jpg` to `153.jpg`) using a combination of the BLIP and Gemini Pro models.

First, use the BLIP model to generate an initial caption for each image. Store these captions in a CSV file along with their corresponding filenames.

Next, process this CSV file in batches of 50. For each batch, send the captions to the Gemini Pro model with the API key `GEMINI_API_KEY` to generate the final titles, descriptions, tags, and categories.

After processing all batches, perform a data validation check to identify and count any missing or "N/A" values in the generated data.

Finally, provide a button to download the completed CSV file named `enriched_content.csv`.

## Optimize gemini prompt

### Subtask:
Modify the prompt in the batch processing cell to instruct the Gemini model to generate titles with a maximum of two words and to return all output as plain text without markdown formatting.


In [34]:
import pandas as pd
import google.generativeai as genai
import re
import time

# Load the captions CSV
df = pd.read_csv('captions.csv')

# Initialize new columns
df['title'] = ''
df['description'] = ''
df['tags'] = ''
df['category'] = ''

# --- Gemini API Configuration ---
genai.configure(api_key="GEMINI_API_KEY")
gemini_model = genai.GenerativeModel('gemini-1.5-flash-latest')

# --- Batch Processing ---
batch_size = 50
for i in range(0, len(df), batch_size):
    batch_df = df.iloc[i:i+batch_size]

    # Create a single prompt for the entire batch
    prompt_parts = [
        "Generate a professional, two-word title, a plain text description, a comma-separated list of tags, and a category for each of the following numbered image captions.",
        "The output for each should be clearly structured with 'Title:', 'Description:', 'Tags:', and 'Category:' headings, and must be in plain text without any markdown formatting (no asterisks or bolding)."
    ]

    for index, row in batch_df.iterrows():
        # Use the actual DataFrame index for numbering in the prompt
        prompt_parts.append(f"{index + 1}. Caption: {row['caption']}")

    prompt = "\n".join(prompt_parts)

    print(f"--- Processing Batch {i//batch_size + 1} ---")

    # Generate content for the batch
    response = gemini_model.generate_content(prompt)

    # --- Robust Parsing Logic ---
    try:
        # Split the entire response into blocks for each numbered item
        # This regex looks for a number followed by a period, e.g., "1.", "2."
        item_blocks = re.split(r'\n(?=\d+\.\s)', response.text)

        for block in item_blocks:
            if not block.strip():
                continue

            # Extract the number to identify the row index
            num_match = re.match(r'(\d+)\.', block)
            if not num_match:
                continue

            # The original DataFrame index is the matched number minus 1
            row_index = int(num_match.group(1)) - 1

            # Parse each field from the block
            title_match = re.search(r"Title:\s*(.*?)\n", block, re.DOTALL | re.IGNORECASE)
            desc_match = re.search(r"Description:\s*(.*?)\n", block, re.DOTALL | re.IGNORECASE)
            tags_match = re.search(r"Tags:\s*(.*?)\n", block, re.DOTALL | re.IGNORECASE)
            cat_match = re.search(r"Category:\s*(.*)", block, re.DOTALL | re.IGNORECASE)

            # Update the main DataFrame at the correct index
            if row_index < len(df):
                df.loc[row_index, 'title'] = title_match.group(1).strip() if title_match else "Parse Error"
                df.loc[row_index, 'description'] = desc_match.group(1).strip() if desc_match else "Parse Error"
                df.loc[row_index, 'tags'] = tags_match.group(1).strip() if tags_match else "Parse Error"
                df.loc[row_index, 'category'] = cat_match.group(1).strip() if cat_match else "Parse Error"

    except Exception as e:
        print(f"An error occurred during parsing: {e}")

    print(f"--- Batch {i//batch_size + 1} Processed ---")
    # A longer delay might be needed for free tier to avoid rate limiting
    time.sleep(20)


# Save the final DataFrame
enriched_csv_filename = 'enriched_content.csv'
df.to_csv(enriched_csv_filename, index=False)

print(f"\nSuccessfully enriched content and saved to '{enriched_csv_filename}'")
display(df.head())

--- Processing Batch 1 ---
--- Batch 1 Processed ---
--- Processing Batch 2 ---
--- Batch 2 Processed ---
--- Processing Batch 3 ---
--- Batch 3 Processed ---
--- Processing Batch 4 ---
--- Batch 4 Processed ---

Successfully enriched content and saved to 'enriched_content.csv'


Unnamed: 0,filename,caption,title,description,tags,category
0,1,a bird flying over a colorful background,Avian Flight,A bird in flight against a vibrant backdrop.,"bird, flying, colorful, background, nature, wi...",Nature
1,2,a painting of a black and orange abstract,Abstract Art,An abstract painting in black and orange hues.,"painting, abstract, art, black, orange, color,...",Art
2,3,a boat floating in a body of water,Watercraft Scene,A boat peacefully floating on water.,"boat, water, sea, ocean, lake, river, floating",Nature
3,4,a close up of a cell with a colorful background,Cellular View,A close-up of a cell against a colorful backgr...,"cell, biology, science, microscopic, colorful,...",Science
4,5,abstract neon background with different colors,Neon Abstract,An abstract background with vibrant neon colors.,"abstract, neon, background, colors, vibrant, b...",Abstract


## Data validation

### Subtask:
Create a new code cell that inspects the final `enriched_content.csv` DataFrame. It will count and report the number of missing or placeholder values (like "N/A" or "Parse Error") in each column to verify data quality.


In [38]:
import pandas as pd

# Read the enriched CSV file
df_enriched = pd.read_csv('enriched_content.csv')

# Define placeholder values to check for
placeholders = ["N/A", "Parse Error", "Not Generated"]

# --- Data Validation ---
print("--- Data Validation Report ---")

# Check for missing values in each column
for col in ['title', 'description', 'tags', 'category']:
    # Count NaNs
    nan_count = df_enriched[col].isnull().sum()

    # Count placeholders (case-insensitive)
    placeholder_count = df_enriched[col].str.contains('|'.join(placeholders), case=False, na=False).sum()

    total_issues = nan_count + placeholder_count

    print(f"\nColumn '{col}':")
    print(f"  - Missing (NaN) values: {nan_count}")
    print(f"  - Placeholder values ('N/A', 'Parse Error', 'Not Generated'): {placeholder_count}")
    print(f"  - Total issues found: {total_issues}")


--- Data Validation Report ---

Column 'title':
  - Missing (NaN) values: 0
  - Placeholder values ('N/A', 'Parse Error', 'Not Generated'): 0
  - Total issues found: 0

Column 'description':
  - Missing (NaN) values: 0
  - Placeholder values ('N/A', 'Parse Error', 'Not Generated'): 0
  - Total issues found: 0

Column 'tags':
  - Missing (NaN) values: 0
  - Placeholder values ('N/A', 'Parse Error', 'Not Generated'): 0
  - Total issues found: 0

Column 'category':
  - Missing (NaN) values: 0
  - Placeholder values ('N/A', 'Parse Error', 'Not Generated'): 0
  - Total issues found: 0


## Create download button

### Subtask:
Create a new, separate code cell that uses `google.colab.files.download()` to create a button for easily downloading the final `enriched_content.csv` file.


In [40]:
from google.colab import files

# Provide a download button for the final CSV
files.download('enriched_content.csv')

print("\n--- Download Link ---")
print("Click the button below to download the final 'enriched_content.csv' file.")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


--- Download Link ---
Click the button below to download the final 'enriched_content.csv' file.


## Summary:

### Data Analysis Key Findings
- The prompt for the Gemini model was successfully optimized to generate two-word titles and plain text output, simplifying subsequent parsing.
- All 153 images were successfully processed in batches, with the Gemini model generating titles, descriptions, tags, and categories for each.
- The data validation check confirmed the quality of the generated content, revealing no missing values or parsing errors across the 'title', 'description', 'tags', and 'category' columns.

### Insights or Next Steps
- The successful implementation of batch processing with a refined prompt demonstrates an effective workflow for enriching large datasets with AI-generated metadata.
- For future tasks, consider exploring other large language models to compare the quality and cost-effectiveness of generated metadata.


# Task
Automate the process of generating metadata for a collection of images stored in Google Drive.

**Here's the desired workflow:**

1.  **File Structure:**
    *   A main folder named `bloomsplash` in your Google Drive.
    *   Inside `bloomsplash`, three subfolders: `new`, `complete`, and `backups`.
    *   Two CSV files in the `bloomsplash` folder: `content.csv` and `enrich.csv`.

2.  **Automated Processing:**
    *   The script should automatically detect new images placed in the `new` folder.
    *   For each new image, it should:
        *   Generate a basic caption using the BLIP model and save it to `content.csv`.
        *   Use the generated caption to get a more detailed title, description, tags, and category from the Gemini API.
        *   Save this enriched data to `enrich.csv`.
    *   Once an image is processed, it should be moved from the `new` folder to the `complete` folder.
    *   Before processing, the script should create a timestamped backup of the existing `enrich.csv` file in the `backups` folder to prevent data loss.

3.  **Output:**
    *   The final output should be the `enrich.csv` file containing the filename, title, description, tags, and category for all processed images.
    *   The script should handle processing in batches to manage API requests efficiently.
    *   The script should be robust enough to handle existing data and only process new files.

## Mount google drive

### Subtask:
Connect the Colab notebook to the user's Google Drive.


**Reasoning**:
To connect the Colab notebook to Google Drive, I need to import the `drive` module and then call the `mount` function to establish the connection. This will allow the notebook to access files and folders within the user's Google Drive.



In [41]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Define core paths

### Subtask:
Establish variables for the main project folders and files (`bloomsplash/`, `new/`, `complete/`, `content.csv`, `enrich.csv`).


**Reasoning**:
I need to define the file paths for the project structure. This involves importing the `os` module, setting a base path for the main directory in Google Drive, and then creating the full paths for the subdirectories and CSV files as instructed. Finally, I will print these paths to verify them.



In [43]:
import os

# Define the base path for the project directory in Google Drive
base_path = '/content/drive/My Drive/bloomsplash'

# Define paths for subdirectories
new_dir = os.path.join(base_path, 'new')
complete_dir = os.path.join(base_path, 'complete')
backups_dir = os.path.join(base_path, 'backups')

# Define paths for CSV files
content_csv_path = os.path.join(base_path, 'content.csv')
enrich_csv_path = os.path.join(base_path, 'enrich.csv')

# Print all paths to verify them
print(f"Base Path: {base_path}")
print(f"New Directory: {new_dir}")
print(f"Complete Directory: {complete_dir}")
print(f"Backups Directory: {backups_dir}")
print(f"Content CSV Path: {content_csv_path}")
print(f"Enrich CSV Path: {enrich_csv_path}")

Base Path: /content/drive/My Drive/bloomsplash
New Directory: /content/drive/My Drive/bloomsplash/new
Complete Directory: /content/drive/My Drive/bloomsplash/complete
Backups Directory: /content/drive/My Drive/bloomsplash/backups
Content CSV Path: /content/drive/My Drive/bloomsplash/content.csv
Enrich CSV Path: /content/drive/My Drive/bloomsplash/enrich.csv


## Ensure directory structure

### Subtask:
Check if the `new/`, `complete/`, and a new `backups/` directory exist within `bloomsplash/`. If not, create them.


**Reasoning**:
I will now create the necessary directories for the project, as outlined in the subtask. I will use `os.makedirs` with `exist_ok=True` to create the `new`, `complete`, and `backups` directories inside the `bloomsplash` folder. I will also include print statements to confirm whether each directory was created or already existed.



In [44]:
# Create the directories if they don't exist
for directory in [new_dir, complete_dir, backups_dir]:
    if not os.path.exists(directory):
        os.makedirs(directory)
        print(f"Directory created: {directory}")
    else:
        print(f"Directory already exists: {directory}")

Directory created: /content/drive/My Drive/bloomsplash/new
Directory created: /content/drive/My Drive/bloomsplash/complete
Directory created: /content/drive/My Drive/bloomsplash/backups


## Automated backup

### Subtask:
Create a timestamped backup of `enrich.csv` if it exists and is not empty.


**Reasoning**:
To create a timestamped backup of `enrich.csv`, I will first import the `datetime` and `shutil` modules. Then, I will check if the file exists and is not empty. If both conditions are met, I will generate a timestamp, create a backup filename, and copy the file to the backups directory, printing a confirmation message.



In [49]:
import pandas as pd
import shutil
from datetime import datetime

# --- Create a Timestamped Backup of enrich.csv ---

if os.path.exists(enrich_csv_path):
    # Check if the file is not empty
    if os.path.getsize(enrich_csv_path) > 0:
        try:
            # Read the CSV to double-check it's a valid, non-empty dataframe
            existing_df = pd.read_csv(enrich_csv_path)
            if not existing_df.empty:
                # Create a timestamp
                timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

                # Construct the backup filename and path
                backup_filename = f"enrich_{timestamp}.csv"
                backup_filepath = os.path.join(backups_dir, backup_filename)

                # Copy the file
                shutil.copy(enrich_csv_path, backup_filepath)

                print(f"Backup of 'enrich.csv' created at: {backup_filepath}")
            else:
                print("'enrich.csv' exists but is empty. No backup created.")
        except pd.errors.EmptyDataError:
             print("'enrich.csv' is empty or invalid. No backup created.")
    else:
        print("'enrich.csv' is empty. No backup created.")
else:
    print("'enrich.csv' does not exist. No backup needed.")

Backup of 'enrich.csv' created at: /content/drive/My Drive/bloomsplash/backups/enrich_2025-07-26_18-18-40.csv


## Initialize csv files

### Subtask:
Check if `content.csv` and `enrich.csv` exist. If not, create them with the appropriate headers.


**Reasoning**:
I will now write the Python code to check for the existence of `content.csv` and `enrich.csv`. If they don't exist, I'll create them with the specified headers. This will ensure the necessary files are in place for the subsequent data processing steps.



In [60]:
import pandas as pd

# --- Ensure content.csv Exists ---
if not os.path.exists(content_csv_path):
    # Create an empty DataFrame with the specified columns
    content_df = pd.DataFrame(columns=['filename', 'caption'])
    # Save the new DataFrame to the CSV file
    content_df.to_csv(content_csv_path, index=False)
    print(f"'content.csv' did not exist. Created a new file at: {content_csv_path}")
else:
    print(f"'content.csv' already exists at: {content_csv_path}")

# --- Ensure enrich.csv Exists ---
if not os.path.exists(enrich_csv_path):
    # Create an empty DataFrame with the specified columns
    enrich_df = pd.DataFrame(columns=['filename', 'title', 'description', 'tags', 'category'])
    # Save the new DataFrame to the CSV file
    enrich_df.to_csv(enrich_csv_path, index=False)
    print(f"'enrich.csv' did not exist. Created a new file at: {enrich_csv_path}")
else:
    print(f"'enrich.csv' already exists at: {enrich_csv_path}")

'content.csv' already exists at: /content/drive/My Drive/bloomsplash/content.csv
'enrich.csv' already exists at: /content/drive/My Drive/bloomsplash/enrich.csv


## Identify new images

### Subtask:
Scan the `new/` directory for any image files.


**Reasoning**:
I will now scan the `new/` directory for image files as requested. I will use `os.listdir` to get all files, then filter for common image extensions, and finally print the list of found images for verification.



In [56]:
# --- Scan for New Image Files ---

# Use os.listdir to get all files in the 'new' directory
all_files_in_new = os.listdir(new_dir)

# Define common image file extensions
image_extensions = ['.jpg', '.jpeg', '.png']

# Filter the list to include only image files and sort them
new_image_files = sorted([
    f for f in all_files_in_new
    if os.path.splitext(f)[1].lower() in image_extensions
])

# --- Report Findings ---
if new_image_files:
    print(f"Found {len(new_image_files)} new images to process:")
    for filename in new_image_files:
        print(f"  - {filename}")
else:
    print("No new images found in the 'new' directory.")

Found 5 new images to process:
  - 154.jpg
  - 155.jpg
  - 156.jpg
  - 157.jpg
  - 158.jpg


**Reasoning**:
The previous step successfully identified new image files. The next logical step is to process these new images to generate initial captions using the BLIP model. I will read the existing `content.csv`, process only the new images that are not already in the CSV, generate captions for them, and append these new captions to the dataframe before saving it back to the CSV file. This aligns with the overall goal of the user's request.



In [57]:
import pandas as pd
from PIL import Image
import os

# --- Load Existing Content CSV ---
try:
    content_df = pd.read_csv(content_csv_path)
except pd.errors.EmptyDataError:
    content_df = pd.DataFrame(columns=['filename', 'caption'])

# Get a list of filenames already processed
processed_files = content_df['filename'].tolist()

# Determine which of the new files haven't been processed yet
files_to_process = [
    f for f in new_image_files
    if os.path.splitext(f)[0] not in processed_files
]

print(f"Found {len(files_to_process)} images that need captioning.")

# --- Generate Captions for New Images ---
new_captions = []

if files_to_process:
    print("Generating captions...")
    for filename in files_to_process:
        image_path = os.path.join(new_dir, filename)
        try:
            # Open the image
            raw_image = Image.open(image_path).convert("RGB")

            # Generate caption using the pre-loaded BLIP model
            inputs = processor(images=raw_image, return_tensors="pt").to(device)
            out = blip_model.generate(**inputs, max_new_tokens=50)
            caption = processor.decode(out[0], skip_special_tokens=True)

            # Get filename without extension
            filename_no_ext = os.path.splitext(filename)[0]

            # Append the result to a temporary list
            new_captions.append({'filename': filename_no_ext, 'caption': caption})
            print(f"  - Caption for '{filename}': {caption}")

        except Exception as e:
            print(f"Could not process {filename}. Error: {e}")

    # --- Update and Save content.csv ---
    if new_captions:
        # Convert the list of new captions to a DataFrame
        new_captions_df = pd.DataFrame(new_captions)

        # Append the new data to the existing DataFrame
        updated_content_df = pd.concat([content_df, new_captions_df], ignore_index=True)

        # Save the updated DataFrame back to the CSV
        updated_content_df.to_csv(content_csv_path, index=False)
        print(f"\nSuccessfully updated 'content.csv' with {len(new_captions)} new entries.")
else:
    print("No new files to process.")

Found 5 images that need captioning.
Generating captions...
  - Caption for '154.jpg': a person riding a bike on a road
  - Caption for '155.jpg': a close up of a cell phone with a blurry background
  - Caption for '156.jpg': a white paper sculpture with a curved design
  - Caption for '157.jpg': a swan swimming in a lake with mountains in the background
  - Caption for '158.jpg': a black and white photo of a wave pattern

Successfully updated 'content.csv' with 5 new entries.


**Reasoning**:
Now that `content.csv` is updated with the new image captions, the next step is to enrich this data using the Gemini API. Following the user's instructions, I will read `enrich.csv`, identify which of the new files need enrichment, process them in batches, and then save the results. This moves the overall process forward logically.



In [61]:
import pandas as pd
import google.generativeai as genai
import re
import time

# --- Load Existing Enriched CSV ---
try:
    existing_df = pd.read_csv(enrich_csv_path)
    # Convert filename column to string to ensure consistent matching
    existing_df['filename'] = existing_df['filename'].astype(str)
except (pd.errors.EmptyDataError, FileNotFoundError):
    existing_df = pd.DataFrame(columns=['filename', 'caption', 'title', 'description', 'tags', 'category'])


# --- Identify Captions to Enrich ---
# Load the full content CSV to get the new captions
all_content_df = pd.read_csv(content_csv_path)
all_content_df['filename'] = all_content_df['filename'].astype(str)

# Filter for content that hasn't been enriched yet
content_to_enrich_df = all_content_df[~all_content_df['filename'].isin(existing_df['filename'])]


print(f"Found {len(content_to_enrich_df)} new items to enrich.")

# --- Batch Processing with Gemini ---
if not content_to_enrich_df.empty:
    newly_enriched_data = []

    batch_size = 50
    for i in range(0, len(content_to_enrich_df), batch_size):
        batch_df = content_to_enrich_df.iloc[i:i+batch_size]

        prompt_parts = [
            "Generate a professional, two-word title, a plain text description, a comma-separated list of tags, and a category for each of the following numbered image captions.",
            "The output for each should be clearly structured with 'Title:', 'Description:', 'Tags:', and 'Category:' headings, and must be in plain text without any markdown formatting."
        ]

        for idx, row in batch_df.iterrows():
            prompt_parts.append(f"{idx + 1}. Filename: {row['filename']}, Caption: {row['caption']}")

        prompt = "\n".join(prompt_parts)

        print(f"--- Processing Batch of {len(batch_df)} with Gemini ---")

        response = gemini_model.generate_content(prompt)
        time.sleep(20)

        # --- Parse the Response ---
        try:
            item_blocks = re.split(r'\n(?=\d+\.\s)', response.text)
            for block in item_blocks:
                if not block.strip(): continue

                num_match = re.match(r'(\d+)\.', block)
                if not num_match: continue

                original_index = int(num_match.group(1)) - 1

                if original_index in content_to_enrich_df.index:
                    # Retrieve the original filename AND caption
                    filename = content_to_enrich_df.loc[original_index, 'filename']
                    caption = content_to_enrich_df.loc[original_index, 'caption'] # <-- THE FIX IS HERE
                else:
                    continue

                title = (re.search(r"Title:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Title:", block, re.I) else "N/A")
                desc = (re.search(r"Description:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Description:", block, re.I) else "N/A")
                tags = (re.search(r"Tags:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Tags:", block, re.I) else "N/A")
                cat = (re.search(r"Category:\s*(.*)", block, re.I | re.S).group(1).strip() if re.search(r"Category:", block, re.I) else "N/A")

                # Append the complete data, including the original caption
                newly_enriched_data.append({
                    'filename': filename,
                    'caption': caption, # <-- AND HERE
                    'title': title,
                    'description': desc,
                    'tags': tags,
                    'category': cat
                })

        except Exception as e:
            print(f"An error occurred during parsing: {e}")

    # --- Update and Save enrich.csv ---
    if newly_enriched_data:
        new_enrich_df = pd.DataFrame(newly_enriched_data)

        # Append the newly enriched data to the existing DataFrame
        final_enrich_df = pd.concat([existing_df, new_enrich_df], ignore_index=True)

        final_enrich_df.to_csv(enrich_csv_path, index=False)
        print(f"\nSuccessfully updated 'enrich.csv' with {len(newly_enriched_data)} new entries.")
        display(final_enrich_df.tail())
else:
    print("No new content to enrich.")

Found 10 new items to enrich.
--- Processing Batch of 10 with Gemini ---

Successfully updated 'enrich.csv' with 10 new entries.


Unnamed: 0,filename,caption,title,description,tags,category
158,154,a person riding a bike on a road,Cycling Road,A person riding a bicycle on a paved road.,"cycling,bike,road,person,transportation,outdoo...",Sports & Recreation
159,155,a close up of a cell phone with a blurry backg...,Cell Phone Closeup,A close-up shot of a cell phone with a blurred...,"cell phone,smartphone,closeup,technology,mobil...",Technology
160,156,a white paper sculpture with a curved design,Paper Sculpture,An intricate white paper sculpture with a flow...,"sculpture,paper,art,craft,white,design,curved,...",Art
161,157,a swan swimming in a lake with mountains in th...,Swan Lake,A swan gracefully swimming in a serene lake wi...,"swan,lake,mountains,nature,water,bird,wildlife...",Nature
162,158,a black and white photo of a wave pattern,Wave Pattern,A black and white photograph depicting a dynam...,"wave,pattern,black and white,photography,textu...",Photography


**Reasoning**:
The images have been processed, captions generated, and metadata enriched. The final step in the automated workflow is to move the processed image files from the `new` directory to the `complete` directory to prevent them from being processed again. I will iterate through the list of `new_image_files` and use `shutil.move` to transfer each one.



In [62]:
import shutil

# --- Move Processed Files ---
if files_to_process:
    print(f"Moving {len(files_to_process)} processed files to the 'complete' directory...")
    for filename in files_to_process:
        source_path = os.path.join(new_dir, filename)
        destination_path = os.path.join(complete_dir, filename)

        # Move the file
        shutil.move(source_path, destination_path)
        print(f"  - Moved '{filename}'")
    print("--- File moving complete. ---")
else:
    print("No new files were processed, so no files to move.")

Moving 5 processed files to the 'complete' directory...
  - Moved '154.jpg'
  - Moved '155.jpg'
  - Moved '156.jpg'
  - Moved '157.jpg'
  - Moved '158.jpg'
--- File moving complete. ---


# Task
Enhance the existing image processing workflow by adding comprehensive documentation and user-friendly command-line interface (CLI) output. This includes structuring the notebook with clear markdown cells for each phase, adding inline comments to the Python code, and providing informative print statements to track the progress of the image captioning and enrichment processes. Finally, generate a summary report at the end of the script to provide a clear overview of the completed tasks.

## Add high-level documentation

### Subtask:
Structure the notebook with clear markdown cells for each major phase (e.g., "Phase 1: Setup", "Phase 2: Processing", "Phase 3: File Management"). Explain the purpose of each phase.


**Reasoning**:
I will start by inserting the first markdown cell as requested in the instructions. This cell will introduce "Phase 1: Setup and Configuration" and its purpose. This is the first step in structuring the notebook with clear, descriptive phases.



In [63]:
from IPython.display import display, Markdown

# Phase 1 Markdown
md_phase1 = """
## Phase 1: Setup and Configuration
This initial phase prepares the environment for automated processing. It involves mounting Google Drive, defining the core file and directory paths, ensuring the necessary folder structure exists, and creating a timestamped backup of existing data to prevent loss.
"""
display(Markdown(md_phase1))


## Phase 1: Setup and Configuration
This initial phase prepares the environment for automated processing. It involves mounting Google Drive, defining the core file and directory paths, ensuring the necessary folder structure exists, and creating a timestamped backup of existing data to prevent loss.


**Reasoning**:
Now I will insert the second markdown cell as requested, which will introduce "Phase 2: Automated Image Processing." This continues the process of structuring the notebook into logical, well-documented phases.



In [64]:
# Phase 2 Markdown
md_phase2 = """
## Phase 2: Automated Image Processing
This phase is the core of the automation script. It scans the 'new' directory for unprocessed images, generates an initial descriptive caption for each using the BLIP model, and then leverages the Gemini API to create enriched metadata, including a title, a detailed description, relevant tags, and a category.
"""
display(Markdown(md_phase2))


## Phase 2: Automated Image Processing
This phase is the core of the automation script. It scans the 'new' directory for unprocessed images, generates an initial descriptive caption for each using the BLIP model, and then leverages the Gemini API to create enriched metadata, including a title, a detailed description, relevant tags, and a category.


**Reasoning**:
Finally, I will add the third markdown cell to introduce "Phase 3: File Management and Cleanup." This completes the subtask of structuring the notebook with clear, descriptive phase introductions.



In [65]:
# Phase 3 Markdown
md_phase3 = """
## Phase 3: File Management and Cleanup
The final phase handles the organization of processed files. To ensure that images are not processed more than once, the script moves them from the 'new' directory to the 'complete' directory after their metadata has been successfully generated and saved.
"""
display(Markdown(md_phase3))


## Phase 3: File Management and Cleanup
The final phase handles the organization of processed files. To ensure that images are not processed more than once, the script moves them from the 'new' directory to the 'complete' directory after their metadata has been successfully generated and saved.


## Add code-level comments

### Subtask:
Add inline comments to the Python code to clarify complex logic, such as the batching process and the parsing of the Gemini response.


**Reasoning**:
The user wants to add comments to the Gemini enrichment script to clarify the batching and parsing logic. I will modify the existing code cell (`aad7365e`) to include these comments as requested.



In [66]:
import pandas as pd
import google.generativeai as genai
import re
import time

# --- Load Existing Enriched CSV ---
try:
    existing_df = pd.read_csv(enrich_csv_path)
    # Convert filename column to string to ensure consistent matching
    existing_df['filename'] = existing_df['filename'].astype(str)
except (pd.errors.EmptyDataError, FileNotFoundError):
    existing_df = pd.DataFrame(columns=['filename', 'caption', 'title', 'description', 'tags', 'category'])


# --- Identify Captions to Enrich ---
# Load the full content CSV to get the new captions
all_content_df = pd.read_csv(content_csv_path)
all_content_df['filename'] = all_content_df['filename'].astype(str)

# Filter for content that hasn't been enriched yet
content_to_enrich_df = all_content_df[~all_content_df['filename'].isin(existing_df['filename'])]


print(f"Found {len(content_to_enrich_df)} new items to enrich.")

# --- Batch Processing with Gemini ---
if not content_to_enrich_df.empty:
    newly_enriched_data = []

    batch_size = 50
    # Iterate over the DataFrame in chunks (batches) of the specified size.
    # This is done to manage API requests efficiently and avoid sending too much data at once.
    for i in range(0, len(content_to_enrich_df), batch_size):
        batch_df = content_to_enrich_df.iloc[i:i+batch_size]

        prompt_parts = [
            "Generate a professional, two-word title, a plain text description, a comma-separated list of tags, and a category for each of the following numbered image captions.",
            "The output for each should be clearly structured with 'Title:', 'Description:', 'Tags:', and 'Category:' headings, and must be in plain text without any markdown formatting."
        ]

        for idx, row in batch_df.iterrows():
            prompt_parts.append(f"{idx + 1}. Filename: {row['filename']}, Caption: {row['caption']}")

        prompt = "\n".join(prompt_parts)

        print(f"--- Processing Batch of {len(batch_df)} with Gemini ---")

        response = gemini_model.generate_content(prompt)
        time.sleep(20)

        # --- Parse the Response ---
        try:
            # Use re.split with a lookahead `(?=...)` to split the response text into blocks.
            # This splits the text at each numbered item (e.g., "1.", "2.") without removing the delimiter.
            item_blocks = re.split(r'\n(?=\d+\.\s)', response.text)
            for block in item_blocks:
                if not block.strip(): continue

                num_match = re.match(r'(\d+)\.', block)
                if not num_match: continue

                original_index = int(num_match.group(1)) - 1

                if original_index in content_to_enrich_df.index:
                    filename = content_to_enrich_df.loc[original_index, 'filename']
                    caption = content_to_enrich_df.loc[original_index, 'caption']
                else:
                    continue

                # Use re.search to find and extract the content for each field.
                # The `re.DOTALL` flag allows `.` to match newlines, and `re.IGNORECASE` ignores case.
                # `(.*?)` is a non-greedy match for any characters until the next newline.
                title = (re.search(r"Title:\s*(.*?)\n", block, re.DOTALL | re.IGNORECASE).group(1).strip() if re.search(r"Title:", block, re.IGNORECASE) else "N/A")
                desc = (re.search(r"Description:\s*(.*?)\n", block, re.DOTALL | re.IGNORECASE).group(1).strip() if re.search(r"Description:", block, re.IGNORECASE) else "N/A")
                tags = (re.search(r"Tags:\s*(.*?)\n", block, re.DOTALL | re.IGNORECASE).group(1).strip() if re.search(r"Tags:", block, re.IGNORECASE) else "N/A")
                cat = (re.search(r"Category:\s*(.*)", block, re.DOTALL | re.IGNORECASE).group(1).strip() if re.search(r"Category:", block, re.IGNORECASE) else "N/A")

                newly_enriched_data.append({
                    'filename': filename,
                    'caption': caption,
                    'title': title,
                    'description': desc,
                    'tags': tags,
                    'category': cat
                })

        except Exception as e:
            print(f"An error occurred during parsing: {e}")

    # --- Update and Save enrich.csv ---
    if newly_enriched_data:
        new_enrich_df = pd.DataFrame(newly_enriched_data)

        final_enrich_df = pd.concat([existing_df, new_enrich_df], ignore_index=True)

        final_enrich_df.to_csv(enrich_csv_path, index=False)
        print(f"\nSuccessfully updated 'enrich.csv' with {len(newly_enriched_data)} new entries.")
        display(final_enrich_df.tail())
else:
    print("No new content to enrich.")

Found 0 new items to enrich.
No new content to enrich.


## Enhance cli output

### Subtask:
Enhance the command-line output to provide better progress tracking for the user.


**Reasoning**:
I will now implement the requested enhancements to the command-line output. I'll add a print statement at the beginning of the script to signal the start of the process. I'll also modify the loops for caption generation and Gemini API processing to include counters that provide real-time progress tracking, as specified in the instructions.



In [67]:
import pandas as pd
from PIL import Image
import os
import google.generativeai as genai
import re
import time

# --- Start of Automated Process ---
print("--- Starting Automated Image Processing Workflow ---")

# --- Generate Captions for New Images ---
if files_to_process:
    print(f"Generating captions for {len(files_to_process)} new images...")
    new_captions = []
    # Add a counter for progress tracking
    for idx, filename in enumerate(files_to_process):
        # Display progress for the user
        print(f"  - Processing image {idx + 1} of {len(files_to_process)}: {filename}")
        image_path = os.path.join(new_dir, filename)
        try:
            raw_image = Image.open(image_path).convert("RGB")
            inputs = processor(images=raw_image, return_tensors="pt").to(device)
            out = blip_model.generate(**inputs, max_new_tokens=50)
            caption = processor.decode(out[0], skip_special_tokens=True)
            filename_no_ext = os.path.splitext(filename)[0]
            new_captions.append({'filename': filename_no_ext, 'caption': caption})

        except Exception as e:
            print(f"Could not process {filename}. Error: {e}")

    if new_captions:
        new_captions_df = pd.DataFrame(new_captions)
        updated_content_df = pd.concat([content_df, new_captions_df], ignore_index=True)
        updated_content_df.to_csv(content_csv_path, index=False)
        print(f"\nSuccessfully updated 'content.csv' with {len(new_captions)} new entries.")

# --- Identify Captions to Enrich ---
all_content_df = pd.read_csv(content_csv_path)
all_content_df['filename'] = all_content_df['filename'].astype(str)
existing_df = pd.read_csv(enrich_csv_path)
existing_df['filename'] = existing_df['filename'].astype(str)
content_to_enrich_df = all_content_df[~all_content_df['filename'].isin(existing_df['filename'])]

# --- Batch Processing with Gemini ---
if not content_to_enrich_df.empty:
    print(f"Enriching {len(content_to_enrich_df)} new items with Gemini...")
    newly_enriched_data = []
    batch_size = 50
    # Calculate the total number of batches
    total_batches = (len(content_to_enrich_df) + batch_size - 1) // batch_size
    for i in range(0, len(content_to_enrich_df), batch_size):
        batch_df = content_to_enrich_df.iloc[i:i+batch_size]
        # Show current batch number and total batches
        print(f"--- Processing batch {i//batch_size + 1} of {total_batches}... ---")
        prompt_parts = [
            "Generate a professional, two-word title, a plain text description, a comma-separated list of tags, and a category for each of the following numbered image captions.",
            "The output for each should be clearly structured with 'Title:', 'Description:', 'Tags:', and 'Category:' headings, and must be in plain text without any markdown formatting."
        ]
        for idx, row in batch_df.iterrows():
            prompt_parts.append(f"{idx + 1}. Filename: {row['filename']}, Caption: {row['caption']}")
        prompt = "\n".join(prompt_parts)
        response = gemini_model.generate_content(prompt)
        time.sleep(20)

        # Parsing logic remains the same...
        try:
            item_blocks = re.split(r'\n(?=\d+\.\s)', response.text)
            for block in item_blocks:
                if not block.strip(): continue
                num_match = re.match(r'(\d+)\.', block)
                if not num_match: continue
                original_index = int(num_match.group(1)) - 1
                if original_index in content_to_enrich_df.index:
                    filename = content_to_enrich_df.loc[original_index, 'filename']
                    caption = content_to_enrich_df.loc[original_index, 'caption']
                else:
                    continue
                title = (re.search(r"Title:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Title:", block, re.I) else "N/A")
                desc = (re.search(r"Description:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Description:", block, re.I) else "N/A")
                tags = (re.search(r"Tags:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Tags:", block, re.I) else "N/A")
                cat = (re.search(r"Category:\s*(.*)", block, re.I | re.S).group(1).strip() if re.search(r"Category:", block, re.I) else "N/A")
                newly_enriched_data.append({'filename': filename, 'caption': caption, 'title': title, 'description': desc, 'tags': tags, 'category': cat})
        except Exception as e:
            print(f"An error occurred during parsing: {e}")

    if newly_enriched_data:
        new_enrich_df = pd.DataFrame(newly_enriched_data)
        final_enrich_df = pd.concat([existing_df, new_enrich_df], ignore_index=True)
        final_enrich_df.to_csv(enrich_csv_path, index=False)
        print(f"\nSuccessfully updated 'enrich.csv' with {len(newly_enriched_data)} new entries.")

--- Starting Automated Image Processing Workflow ---
Generating captions for 5 new images...
  - Processing image 1 of 5: 154.jpg
Could not process 154.jpg. Error: [Errno 2] No such file or directory: '/content/drive/My Drive/bloomsplash/new/154.jpg'
  - Processing image 2 of 5: 155.jpg
Could not process 155.jpg. Error: [Errno 2] No such file or directory: '/content/drive/My Drive/bloomsplash/new/155.jpg'
  - Processing image 3 of 5: 156.jpg
Could not process 156.jpg. Error: [Errno 2] No such file or directory: '/content/drive/My Drive/bloomsplash/new/156.jpg'
  - Processing image 4 of 5: 157.jpg
Could not process 157.jpg. Error: [Errno 2] No such file or directory: '/content/drive/My Drive/bloomsplash/new/157.jpg'
  - Processing image 5 of 5: 158.jpg
Could not process 158.jpg. Error: [Errno 2] No such file or directory: '/content/drive/My Drive/bloomsplash/new/158.jpg'


**Reasoning**:
The previous command failed because the image files were not found in the `new` directory, likely because they were already moved to the `complete` directory in a prior step. To fix this and continue with the subtask, I will first re-run the code that identifies new images to ensure `files_to_process` is empty. Then, I will re-run the modified processing code. Since `files_to_process` will be empty, the failing caption generation loop will be skipped, allowing the script to proceed and demonstrate the enhanced output for the Gemini batch processing part, which is the core of the subtask.



In [68]:
# --- Scan for New Image Files ---
all_files_in_new = os.listdir(new_dir)
image_extensions = ['.jpg', '.jpeg', '.png']
new_image_files = sorted([
    f for f in all_files_in_new
    if os.path.splitext(f)[1].lower() in image_extensions
])

# Get a list of filenames already processed
processed_files = content_df['filename'].tolist()

# Determine which of the new files haven't been processed yet
files_to_process = [
    f for f in new_image_files
    if os.path.splitext(f)[0] not in processed_files
]

# --- Start of Automated Process ---
print("--- Starting Automated Image Processing Workflow ---")

# --- Generate Captions for New Images ---
if files_to_process:
    print(f"Generating captions for {len(files_to_process)} new images...")
    new_captions = []
    # Add a counter for progress tracking
    for idx, filename in enumerate(files_to_process):
        # Display progress for the user
        print(f"  - Processing image {idx + 1} of {len(files_to_process)}: {filename}")
        image_path = os.path.join(new_dir, filename)
        try:
            raw_image = Image.open(image_path).convert("RGB")
            inputs = processor(images=raw_image, return_tensors="pt").to(device)
            out = blip_model.generate(**inputs, max_new_tokens=50)
            caption = processor.decode(out[0], skip_special_tokens=True)
            filename_no_ext = os.path.splitext(filename)[0]
            new_captions.append({'filename': filename_no_ext, 'caption': caption})

        except Exception as e:
            print(f"Could not process {filename}. Error: {e}")

    if new_captions:
        new_captions_df = pd.DataFrame(new_captions)
        updated_content_df = pd.concat([content_df, new_captions_df], ignore_index=True)
        updated_content_df.to_csv(content_csv_path, index=False)
        print(f"\nSuccessfully updated 'content.csv' with {len(new_captions)} new entries.")
else:
    print("No new images found to generate captions for.")

# --- Identify Captions to Enrich ---
all_content_df = pd.read_csv(content_csv_path)
all_content_df['filename'] = all_content_df['filename'].astype(str)
try:
    existing_df = pd.read_csv(enrich_csv_path)
    existing_df['filename'] = existing_df['filename'].astype(str)
except (FileNotFoundError, pd.errors.EmptyDataError):
    existing_df = pd.DataFrame(columns=['filename', 'title', 'description', 'tags', 'category'])

content_to_enrich_df = all_content_df[~all_content_df['filename'].isin(existing_df['filename'])]

# --- Batch Processing with Gemini ---
if not content_to_enrich_df.empty:
    print(f"Enriching {len(content_to_enrich_df)} new items with Gemini...")
    newly_enriched_data = []
    batch_size = 50
    # Calculate the total number of batches
    total_batches = (len(content_to_enrich_df) + batch_size - 1) // batch_size
    for i in range(0, len(content_to_enrich_df), batch_size):
        batch_df = content_to_enrich_df.iloc[i:i+batch_size]
        # Show current batch number and total batches
        print(f"--- Processing batch {i//batch_size + 1} of {total_batches}... ---")
        prompt_parts = [
            "Generate a professional, two-word title, a plain text description, a comma-separated list of tags, and a category for each of the following numbered image captions.",
            "The output for each should be clearly structured with 'Title:', 'Description:', 'Tags:', and 'Category:' headings, and must be in plain text without any markdown formatting."
        ]
        for idx, row in batch_df.iterrows():
            prompt_parts.append(f"{idx + 1}. Filename: {row['filename']}, Caption: {row['caption']}")
        prompt = "\n".join(prompt_parts)
        response = gemini_model.generate_content(prompt)
        time.sleep(20)

        # Parsing logic remains the same...
        try:
            item_blocks = re.split(r'\n(?=\d+\.\s)', response.text)
            for block in item_blocks:
                if not block.strip(): continue
                num_match = re.match(r'(\d+)\.', block)
                if not num_match: continue
                original_index = int(num_match.group(1)) - 1
                if original_index in content_to_enrich_df.index:
                    filename = content_to_enrich_df.loc[original_index, 'filename']
                    caption = content_to_enrich_df.loc[original_index, 'caption']
                else:
                    continue
                title = (re.search(r"Title:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Title:", block, re.I) else "N/A")
                desc = (re.search(r"Description:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Description:", block, re.I) else "N/A")
                tags = (re.search(r"Tags:\s*(.*?)\n", block, re.I | re.S).group(1).strip() if re.search(r"Tags:", block, re.I) else "N/A")
                cat = (re.search(r"Category:\s*(.*)", block, re.I | re.S).group(1).strip() if re.search(r"Category:", block, re.I) else "N/A")
                newly_enriched_data.append({'filename': filename, 'caption': caption, 'title': title, 'description': desc, 'tags': tags, 'category': cat})
        except Exception as e:
            print(f"An error occurred during parsing: {e}")

    if newly_enriched_data:
        new_enrich_df = pd.DataFrame(newly_enriched_data)
        final_enrich_df = pd.concat([existing_df, new_enrich_df], ignore_index=True)
        final_enrich_df.to_csv(enrich_csv_path, index=False)
        print(f"\nSuccessfully updated 'enrich.csv' with {len(newly_enriched_data)} new entries.")
else:
    print("No new items to enrich.")

--- Starting Automated Image Processing Workflow ---
No new images found to generate captions for.
No new items to enrich.


## Create a final summary report

### Subtask:
Create a final summary report that provides a clear overview of the completed tasks.


**Reasoning**:
I will now add the final summary report to the end of the script. This involves adding a concluding print statement and then a series of print statements that use f-strings to display the total number of new images processed, confirm their move to the `complete` directory, and state the total number of enriched entries. This directly addresses all the requirements of the subtask.



In [69]:
# --- Final Summary Report ---
print("\n--- Automated Workflow Complete ---")
print("\n--- Final Summary Report ---")

# Use the length of the 'files_to_process' list, which was defined
# when we first identified the new images that needed captioning.
new_images_count = len(files_to_process)

# Use the length of the 'content_to_enrich_df' which was defined
# right before the Gemini batch processing step.
enriched_items_count = len(content_to_enrich_df)


if new_images_count > 0:
    print(f"- New images found and processed: {new_images_count}")
    print(f"- Processed files successfully moved to: '{complete_dir}'")
else:
    print("- No new images were found to process or move.")

if enriched_items_count > 0:
    print(f"- Enriched metadata entries added: {enriched_items_count}")
else:
    print("- No new entries were added for metadata enrichment.")



--- Automated Workflow Complete ---

--- Final Summary Report ---
- No new images were found to process or move.
- No new entries were added for metadata enrichment.


## Review and refine

### Subtask:
Review the entire notebook to ensure all documentation, comments, and print statements are clear, consistent, and professional.


## Summary:

### Data Analysis Key Findings
* The image processing workflow was successfully enhanced with comprehensive documentation, including high-level markdown explanations for each phase (Setup, Processing, File Management).
* Inline comments were added to the Python code to clarify complex logic, particularly for the batch processing of API requests and the parsing of responses.
* The command-line interface (CLI) output was improved to provide real-time progress tracking, showing the number of images being processed and the status of batch operations.
* A final summary report was implemented at the end of the script, which dynamically reports the number of new images processed and metadata entries enriched.
* The entire notebook was reviewed and refined to ensure all documentation, comments, and print statements are clear, consistent, and professional, making the workflow more user-friendly and maintainable.

### Insights or Next Steps
* The enhanced documentation and progress tracking make the automated workflow more transparent and easier for users to monitor, which is crucial for identifying and debugging issues in a multi-step process.
* To further improve the workflow, consider adding more robust error handling and logging to a dedicated file, which would provide a persistent record of operations and any issues encountered during execution.
