<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/Product_Marketing_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Sample Data for AI Product Marketing System for Image Generation
# Product Marketing AI System

**Overview:**

The Product Marketing AI System automatically creates high-quality marketing visuals by processing, enhancing, and refining input images.

**Key Steps:**
- **Input & Preprocessing:**  
  Users provide a base image (e.g., a background or scene) and a secondary image (e.g., a product or model). The system enhances these images, adjusts resolution and contrast, and generates segmentation masks.

- **Context & Prompt Generation:**  
  It analyzes the images to extract contextual details and automatically generates a detailed, structured marketing prompt tailored to the product and target audience.

- **Image Generation & Refinement:**  
  Advanced AI models use the prompt to generate photorealistic marketing images. The system then refines the results based on user feedback to ensure the output meets professional quality standards.

- **Quality Evaluation:**  
  Final images are evaluated using metrics (such as SSIM, PSNR, and color histograms) to compare them with the original inputs. Additionally, AI-driven feedback helps further improve image quality.

**Applications:**
- **Digital Advertising & E-Commerce:**  
  Generate stunning visuals for online stores, social media campaigns, and ad banners.
- **Branding & Marketing:**  
  Enhance and standardize visual branding materials across various industries such as fashion, tech, automotive, and real estate.
- **Content Creation:**  
  Streamline production of professional digital content for websites, promotions, and digital signage.

This system provides a seamless, automated solution for creating visually appealing marketing assets, reducing manual effort and ensuring consistency and high quality across all outputs.

In [17]:
!pip install serpapi google-search-results



In [18]:
import os
import csv
import uuid
import re
import time
from urllib.parse import urlparse
import requests
from bs4 import BeautifulSoup
from PIL import Image
import numpy as np
import cv2  # OpenCV for quality checks

# --- Configuration ---
BASE_DIR = "ai_marketing_test_dataset"
BASE_IMAGES_FOLDER = os.path.join(BASE_DIR, "base_images")
SECONDARY_IMAGES_FOLDER = os.path.join(BASE_DIR, "secondary_images")
METADATA_FILE = os.path.join(BASE_DIR, "image_metadata.csv")

# Quality check parameters
MIN_DIMENSION = 600  # Minimum width/height in pixels
MIN_VARIANCE = 50    # Minimum variance in grayscale pixel values
MIN_LAPLACIAN = 25   # Minimum variance of Laplacian (blurriness check)

# DuckDuckGo and scraping parameters
DUCKDUCKGO_IMAGE_URL = "https://duckduckgo.com/i.js"
MAX_IMAGES_PER_QUERY = 3  # Limit images per query
ALLOWED_DOMAINS = ["unsplash.com", "pexels.com", "pixabay.com"]
REQUEST_TIMEOUT = 10  # Timeout for HTTP requests
RATE_LIMIT_DELAY = 2  # Delay between requests to avoid bans

In [19]:
# --- Image Sources (Queries for Search) ---
INDUSTRY_TEMPLATES = {
    "fashion": {
        "base_queries": [
            "fashion photoshoot background outdoor city", "luxury retail store interior clean",
            "urban street fashion backdrop bokeh", "minimalist white studio background fashion",
            "runway show background lights", "vintage boutique interior",
            "beach fashion shoot background", "mountain landscape fashion backdrop"
        ],
        "secondary_queries": [
            "female fashion model standing studio", "male fashion model walking urban",
            "diverse group of fashion models laughing", "plus size fashion model portrait",
            "model wearing casual outfit outdoor", "model wearing formal dress red carpet",
            "model wearing activewear gym", "model close up face beauty shot",
            "model full body shot isolated", "designer dress on hanger studio",
            "luxury handbag closeup on table", "fashion accessories collection white background",
            "luxury watch on wrist close up detail", "elegant necklace on display stand",
            "stylish mens shirt studio shot", "fashion trousers folded white background",
            "denim jeans stack photography", "pair of sneakers clean background",
            "high heels studio shot", "winter coat on mannequin",
            "swimsuit on model beach", "sunglasses on face close up",
            "scarf draped elegant", "leather wallet photography"
        ]
    },
    "food": {
        "base_queries": [
            "gourmet kitchen countertop high angle marble", "fine dining restaurant table setting",
            "rustic wooden table background bokeh", "outdoor cafe patio sunny",
            "bakery display case vibrant", "farmers market produce display",
            "cozy home kitchen window light"
        ],
        "secondary_queries": [
            "plated gourmet dish close up steak", "appetizing dessert photography chocolate cake",
            "fresh fruit and vegetable colorful arrangement", "refreshing cocktail condensation glass bar",
            "coffee cup steam close up", "artisanal bread on cutting board",
            "soup bowl overhead shot", "pizza slice melted cheese close up",
            "sushi platter arrangement", "bottle of wine with glass",
            "ice cream scoop texture", "raw ingredients selection herbs spices"
        ]
    },
    "tech": {
        "base_queries": [
            "modern office desk setup laptop monitor", "minimalist white workspace clean",
            "futuristic tech lab abstract", "urban technology street view night",
            "server room blue light", "gaming setup neon lights",
            "smart home interior blurred background"
        ],
        "secondary_queries": [
            "smartphone screen close up hand holding", "laptop on modern desk professional setting",
            "smartwatch on wrist active", "wireless earbuds charging case open",
            "mechanical keyboard close up", "gaming mouse and pad",
            "tablet device in hands using stylus", "portable speaker outdoor nature",
            "noise cancelling headphones photography", "dslr camera lens detail",
            "drone flying over landscape", "smart home speaker on counter",
            "external hard drive photography"
        ]
    },
    "beauty": {
        "base_queries": [
            "luxury bathroom vanity clean marble", "serene spa setting with candles plants",
            "natural light portrait background soft focus", "elegant dressing table makeup setup",
            "cosmetic store aisle blurred", "tropical resort spa background"
        ],
        "secondary_queries": [
            "model with clear skin glowing portrait", "makeup palette brushes flat lay",
            "perfume bottle luxury gold", "skincare product texture cream swatch",
            "hair product shampoo bottle studio", "model applying lipstick mirror",
            "model demonstrating face serum application", "nail polish bottles colorful",
            "cosmetic brush set clean background", "bar of soap natural ingredients",
            "essential oil bottle with plants"
        ]
    },
    "automotive": {
        "base_queries": [
            "scenic coastal road sunset drive", "city parking lot night lights",
            "clean automotive showroom floor", "empty desert road daytime",
            "snowy mountain road empty", "industrial warehouse interior"
        ],
        "secondary_queries": [
            "luxury sedan studio shot low angle", "sports car dynamic action shot race track",
            "rugged SUV off-road trail", "car interior dashboard detail modern",
            "motorcycle parked city street", "electric car charging station futuristic",
            "vintage car restored sunny day", "car wheel rim close up",
            "headlights detail night", "convertible car open top"
        ]
    },
    "real_estate": {
        "base_queries": [
            "modern living room interior spacious light", "luxury kitchen interior island",
            "house exterior sunny day landscaped yard", "apartment balcony view cityscape high rise",
            "backyard garden landscape patio", "bedroom interior cozy minimalist",
            "modern bathroom interior clean", "home office setup window view",
            "building facade modern architecture", "rooftop terrace city view"
        ],
        "secondary_queries": [
            "luxury sofa in living room", "decorative abstract art on wall",
            "modern kitchen appliance set stainless steel", "happy family relaxing on sofa",
            "person enjoying balcony view coffee", "dining table setup elegant",
            "comfortable armchair reading nook", "exterior door entryway welcoming",
            "swimming pool luxury backyard", "patio furniture setup"
        ]
    },
    "travel": {
        "base_queries": [
            "tropical beach paradise sunset", "mountain landscape panorama",
            "historic city street europe", "luxury hotel lobby elegant",
            "airplane window view clouds", "cruise ship deck ocean",
            "safari jeep african savannah", "northern lights night sky"
        ],
        "secondary_queries": [
            "traveler with backpack hiking trail", "couple holding hands beach sunset",
            "family vacation photo happy", "suitcase and passport travel",
            "airplane wing view clouds", "hotel room modern",
            "travel guidebook and map", "tourist taking photo landmark",
            "adventure gear camping", "local cuisine street food market"
        ]
    },
    "health": {
        "base_queries": [
            "modern hospital lobby clean", "doctor office examination room",
            "fitness center gym equipment", "yoga studio calm",
            "pharmacy interior shelves", "medical lab research",
            "physical therapy clinic", "dental office waiting room"
        ],
        "secondary_queries": [
            "doctor in white coat stethoscope", "nurse caring for patient",
            "fitness trainer helping client", "yoga instructor demonstrating pose",
            "pharmacist behind counter", "medical researcher in lab",
            "physical therapist assisting patient", "dentist examining patient",
            "healthy meal prep ingredients", "exercise equipment home gym"
        ]
    },
    "education": {
        "base_queries": [
            "modern classroom whiteboard", "university lecture hall students",
            "school library bookshelves", "science lab experiment",
            "art classroom creative", "school playground children playing",
            "online learning virtual classroom", "study group students notebooks"
        ],
        "secondary_queries": [
            "teacher explaining lesson", "student studying book",
            "graduation ceremony cap and gown", "science experiment equipment",
            "art supplies paintbrushes", "children playing on playground",
            "laptop online course", "study notes and textbooks",
            "school bus yellow", "university campus buildings"
        ]
    }
}

GENERAL_QUERIES = {
    "base_queries": [
        "clean white seamless background high resolution studio",
        "gradient studio background soft colors pastel",
        "abstract geometric background modern",
        "blurred bokeh light background warm",
        "minimalist empty room background concrete",
        "simple colored paper background",
        "texture background concrete wall",
        "wooden surface background tabletop"
    ],
    "secondary_queries": [
        "isolated product on transparent background mockup",
        "person full body pose white background standing",
        "still life objects arrangement clean background",
        "single item close up studio white background",
        "cutout person on white background",
        "isolated electronics white background",
        "isolated food item white background",
        "isolated fashion item white background",
        "isolated accessory white background",
        "group of diverse people white background casual"
    ]
}

In [20]:
# --- Helper Functions ---

def init_metadata():
    """Initializes the metadata CSV file with headers if it doesn't exist."""
    if not os.path.exists(METADATA_FILE):
        os.makedirs(BASE_DIR, exist_ok=True)
        os.makedirs(BASE_IMAGES_FOLDER, exist_ok=True)
        os.makedirs(SECONDARY_IMAGES_FOLDER, exist_ok=True)
        with open(METADATA_FILE, "w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(["image_path", "category", "source_url", "industry", "type", "source_query"])
        print(f"Initialized metadata file: {METADATA_FILE}")
    else:
        print(f"Metadata file already exists: {METADATA_FILE}. Processing will add new entries.")

def is_valid_url(url):
    """Checks if a string is a valid URL."""
    if not isinstance(url, str):
        return False
    try:
        result = urlparse(url)
        return bool(result.scheme) and bool(result.netloc)
    except:
        return False

def perform_quality_check(image_path):
    """Performs dimension, variance, and blurriness checks on an image."""
    results = {"passed": False, "reason": "Unknown error"}
    if not os.path.exists(image_path):
        results["reason"] = "File not found"
        return results
    try:
        with Image.open(image_path) as img:
            width, height = img.size
            if width < MIN_DIMENSION or height < MIN_DIMENSION:
                results["reason"] = f"Dimensions too small ({width}x{height} < {MIN_DIMENSION}x{MIN_DIMENSION})"
                return results

        img_cv = cv2.imread(image_path)
        if img_cv is None:
            results["reason"] = "Could not read image with OpenCV"
            return results

        if len(img_cv.shape) == 3:
            img_cv_gray = cv2.cvtColor(img_cv, cv2.COLOR_BGR2GRAY)
        elif len(img_cv.shape) == 2:
            img_cv_gray = img_cv
        else:
            results["reason"] = "Unsupported image format/shape"
            return results

        if img_cv_gray.shape[0] < 3 or img_cv_gray.shape[1] < 3:
            results["reason"] = "Image too small for OpenCV checks"
            return results

        variance = np.var(img_cv_gray)
        if variance < MIN_VARIANCE:
            results["reason"] = f"Variance too low ({variance:.2f} < {MIN_VARIANCE}) - likely near-blank"
            return results

        laplacian_var = cv2.Laplacian(img_cv_gray, cv2.CV_64F).var()
        if laplacian_var < MIN_LAPLACIAN:
            results["reason"] = f"Laplacian variance too low ({laplacian_var:.2f} < {MIN_LAPLACIAN}) - likely blurry"
            return results

        results["passed"] = True
        results["reason"] = "Passed quality checks"
        return results

    except FileNotFoundError:
        results["reason"] = "File not found during quality check"
        return results
    except Exception as e:
        results["reason"] = f"Error during quality check: {e}"
        return results

def create_category_name(query, industry, image_type):
    """Generates a category name from the query, industry, and type."""
    words = query.lower().split()
    meaningful_words = [word for word in words if len(word) > 2 and word not in [
        'a', 'an', 'the', 'in', 'on', 'with', 'for', 'and', 'high', 'low', 'up', 'down',
        'clean', 'white', 'black', 'red', 'blue', 'studio', 'background', 'image',
        'photo', 'photography', 'picture', 'view', 'shot', 'model', 'product',
        'isolated', 'transparent', 'mockup', 'group', 'diverse', 'people', 'casual',
        'setting', 'urban', 'outdoor', 'indoor', 'interior', 'exterior', 'scene', 'backdrop',
        'display', 'arrangement', 'detail', 'collection'
    ]]
    base_name = "_".join(meaningful_words[:4])
    if not base_name:
        base_name = "_".join(query.lower().split()[:3])
        base_name = re.sub(r'[^a-z0-9_]', '', base_name)
    base_name = re.sub(r'[^a-z0-9_]', '', base_name)
    base_name = re.sub(r'_{2,}', '_', base_name).strip('_')
    if not base_name:
        base_name = "misc"
    return f"{industry}_{image_type}_{base_name}"

def load_metadata():
    """Loads existing metadata from the CSV file."""
    if not os.path.exists(METADATA_FILE):
        return []
    with open(METADATA_FILE, "r", newline="", encoding="utf-8") as f:
        reader = csv.reader(f)
        next(reader)  # Skip header
        return list(reader)

def save_metadata(data_rows):
    """Saves metadata rows to the CSV file, overwriting it."""
    os.makedirs(BASE_DIR, exist_ok=True)
    with open(METADATA_FILE, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(["image_path", "category", "source_url", "industry", "type", "source_query"])
        writer.writerows(data_rows)

def search_and_scrape_images(query, industry, image_type, category, max_images=MAX_IMAGES_PER_QUERY):
    """Searches for images using DuckDuckGo and scrapes from Unsplash, Pexels, Pixabay."""
    # Format query for DuckDuckGo
    query_formatted = query.replace(" ", "+")
    params = {
        "q": query_formatted,
        "l": "us-en",
        "o": "json",
        "t": "h_",
        "f": ",,,license:free"  # Filter for free-to-use images
    }
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }

    save_folder = BASE_IMAGES_FOLDER if image_type == "base" else SECONDARY_IMAGES_FOLDER
    os.makedirs(save_folder, exist_ok=True)
    new_metadata = []

    try:
        # Search with DuckDuckGo
        response = requests.get(DUCKDUCKGO_IMAGE_URL, params=params, headers=headers, timeout=REQUEST_TIMEOUT)
        response.raise_for_status()
        results = response.json().get("results", [])

        # Filter results for allowed domains
        image_urls = []
        for result in results:
            url = result.get("image", "")
            if not url:
                url = result.get("url", "")
            parsed_url = urlparse(url)
            domain = parsed_url.netloc.lower()
            if any(allowed in domain for allowed in ALLOWED_DOMAINS):
                image_urls.append(url)
                if len(image_urls) >= max_images:
                    break

        if not image_urls:
            print(f"No relevant images found for query: '{query}'")
            return []

        # Scrape each URL for actual image sources
        for url in image_urls:
            try:
                page_response = requests.get(url, headers=headers, timeout=REQUEST_TIMEOUT)
                page_response.raise_for_status()
                soup = BeautifulSoup(page_response.text, "html.parser")
                domain = urlparse(url).netloc.lower()

                img_url = None
                if "unsplash.com" in domain:
                    img_tag = soup.find("img", {"class": re.compile("tB6UZ.*")})
                    img_url = img_tag.get("src") if img_tag else None
                elif "pexels.com" in domain:
                    img_tag = soup.find("img", {"class": re.compile("Photo.*")})
                    img_url = img_tag.get("src") if img_tag else None
                elif "pixabay.com" in domain:
                    img_tag = soup.find("img", {"class": re.compile("image.*")})
                    img_url = img_tag.get("src") if img_tag else None

                if not img_url or not is_valid_url(img_url):
                    print(f"No valid image URL found at {url}")
                    continue

                # Download image
                img_response = requests.get(img_url, headers=headers, timeout=REQUEST_TIMEOUT)
                if img_response.status_code != 200:
                    print(f"Failed to download image from {img_url}")
                    continue

                # Generate unique filename
                file_ext = "jpg" if "jpg" in img_url.lower() else "png"
                filename = f"{category}_{uuid.uuid4().hex[:8]}.{file_ext}"
                image_path = os.path.join(save_folder, filename)

                # Save image
                with open(image_path, "wb") as f:
                    f.write(img_response.content)

                # Add to metadata
                relative_path = os.path.join(save_folder.split(os.sep)[-2:], filename).replace(os.sep, "/")
                new_metadata.append([relative_path, category, url, industry, image_type, query])
                print(f"Downloaded and saved: {relative_path}")

            except Exception as e:
                print(f"Error processing URL {url}: {e}")
                continue

            time.sleep(RATE_LIMIT_DELAY)  # Rate limiting

        return new_metadata

    except Exception as e:
        print(f"Error searching for query '{query}': {e}")
        return []

In [21]:
# --- Main Process ---

def main():
    # Check dependencies
    try:
        import cv2
        print("OpenCV (cv2) is installed. Quality checks will run.")
    except ImportError:
        print("="*70)
        print("!!! OpenCV (cv2) not found. Please install it: pip install opencv-python !!!")
        print("!!! Basic quality checks (variance, laplacian for blur) will be skipped. !!!")
        print("!!! Images will only be checked for basic dimensions.                     !!!")
        print("="*70)
        def perform_quality_check(image_path):
            results = {"passed": False, "reason": "OpenCV not installed"}
            if not os.path.exists(image_path):
                results["reason"] = "File not found"
                return results
            try:
                with Image.open(image_path) as img:
                    width, height = img.size
                    if width < MIN_DIMENSION or height < MIN_DIMENSION:
                        results["reason"] = f"Dimensions too small ({width}x{height} < {MIN_DIMENSION}x{MIN_DIMENSION})"
                        return results
                results["passed"] = True
                results["reason"] = "Passed basic dimension checks"
                return results
            except FileNotFoundError:
                results["reason"] = "File not found during basic check"
                return results
            except Exception as e:
                results["reason"] = f"Error during basic check: {e}"
                return results

    try:
        import requests
        import bs4
        print("Requests and BeautifulSoup4 are installed. Web scraping enabled.")
    except ImportError:
        print("="*70)
        print("!!! Requests or BeautifulSoup4 not found. Please install them: pip install requests beautifulsoup4 !!!")
        print("!!! Web scraping will be skipped. Manual image downloading required. !!!")
        print("="*70)
        def search_and_scrape_images(*args, **kwargs):
            print("Web scraping disabled due to missing dependencies.")
            return []

    init_metadata()

    print("="*80)
    print("AI Marketing Image Dataset Building Process with DuckDuckGo")
    print("="*80)
    print(f"This script uses DuckDuckGo to search for images on Unsplash, Pexels, and Pixabay.")
    print(f"Images will be saved to {BASE_IMAGES_FOLDER} (base) and {SECONDARY_IMAGES_FOLDER} (secondary).")
    print(f"Metadata will be updated in {METADATA_FILE}.")
    print(f"After scraping, you can manually add more images by searching on these websites.")
    print(f"Manual images should be saved to the appropriate folders and added to {METADATA_FILE}.")
    print("Quality checks will validate all images after scraping and manual additions.")
    print("="*80)

    # Collect all queries
    all_queries = []
    print("\n--- Base Image Queries ---")
    print(f"Will save to: {BASE_IMAGES_FOLDER}")
    for industry, data in INDUSTRY_TEMPLATES.items():
        for query in data.get("base_queries", []):
            category = create_category_name(query, industry, "base")
            print(f"- Query: '{query}' (Industry: {industry}, Category: {category})")
            all_queries.append((query, industry, "base", category))
    for query in GENERAL_QUERIES.get("base_queries", []):
        category = create_category_name(query, "general", "base")
        print(f"- Query: '{query}' (Industry: various, Category: {category})")
        all_queries.append((query, "various", "base", category))

    print("\n--- Secondary Image Queries ---")
    print(f"Will save to: {SECONDARY_IMAGES_FOLDER}")
    for industry, data in INDUSTRY_TEMPLATES.items():
        for query in data.get("secondary_queries", []):
            category = create_category_name(query, industry, "secondary")
            print(f"- Query: '{query}' (Industry: {industry}, Category: {category})")
            all_queries.append((query, industry, "secondary", category))
    for query in GENERAL_QUERIES.get("secondary_queries", []):
        category = create_category_name(query, "general", "secondary")
        print(f"- Query: '{query}' (Industry: various, Category: {category})")
        all_queries.append((query, "various", "secondary", category))

    # Perform DuckDuckGo search and scraping
    print("\n--- Searching and Scraping Images with DuckDuckGo ---")
    new_metadata_entries = []
    for query, industry, image_type, category in all_queries:
        print(f"Searching for query: '{query}'")
        entries = search_and_scrape_images(query, industry, image_type, category)
        new_metadata_entries.extend(entries)
        time.sleep(RATE_LIMIT_DELAY)

    # Load existing metadata and append new entries
    existing_data = load_metadata()
    existing_paths = set(row[0] for row in existing_data)
    filtered_new_entries = [entry for entry in new_metadata_entries if entry[0] not in existing_paths]
    updated_data = existing_data + filtered_new_entries

    # Save updated metadata
    save_metadata(updated_data)
    print(f"Added {len(filtered_new_entries)} new images to metadata.")

    print("\n" + "="*80)
    print("Manual Addition Instructions (Optional):")
    print(f"1. Review images in {BASE_IMAGES_FOLDER} and {SECONDARY_IMAGES_FOLDER}.")
    print("2. To add more images, search queries on Unsplash, Pexels, or Pixabay.")
    print(f"3. Save 'base' images to: {BASE_IMAGES_FOLDER}")
    print(f"4. Save 'secondary' images to: {SECONDARY_IMAGES_FOLDER}")
    print(f"5. Update {METADATA_FILE} with new entries using a spreadsheet or text editor.")
    print("   - Columns: image_path, category, source_url, industry, type, source_query")
    input("Press Enter when ready for quality check (or if skipping manual additions).")
    print("="*80)

    print("\n--- Performing Quality Checks ---")
    existing_data = load_metadata()
    if not existing_data:
        print("No image entries found in the metadata file. Please ensure scraping succeeded or add entries manually.")
        return

    passed_checks_count = 0
    failed_checks_count = 0
    processed_paths = set()
    filtered_data = []
    header = ["image_path", "category", "source_url", "industry", "type", "source_query"]
    filtered_data.append(header)

    print(f"Checking {len(existing_data)} entries in {METADATA_FILE}...")
    for i, row in enumerate(existing_data):
        if len(row) != len(header):
            print(f"Skipping malformed row {i+1}: {row}")
            failed_checks_count += 1
            continue

        image_path, category, source_url, industry, image_type, source_query = row
        absolute_image_path = os.path.join(os.path.dirname(METADATA_FILE), image_path)
        if absolute_image_path in processed_paths:
            print(f"Skipping duplicate entry for path: {image_path}")
            failed_checks_count += 1
            continue
        processed_paths.add(absolute_image_path)

        print(f"Checking {image_path}...")
        check_result = perform_quality_check(absolute_image_path)
        if check_result["passed"]:
            print(f"  PASSED.")
            filtered_data.append(row)
            passed_checks_count += 1
        else:
            print(f"  FAILED: {check_result['reason']}. Removing file and metadata entry.")
            failed_checks_count += 1
            try:
                if os.path.exists(absolute_image_path):
                    os.remove(absolute_image_path)
                    print(f"  Removed file: {absolute_image_path}")
            except OSError as e:
                print(f"  Error removing file {absolute_image_path}: {e}")

    print("\n--- Quality Check Complete ---")
    print(f"Processed {len(existing_data)} entries.")
    print(f"Passed quality checks: {passed_checks_count}")
    print(f"Failed quality checks: {failed_checks_count}")

    save_metadata(filtered_data[1:])

    print(f"Metadata file updated: {METADATA_FILE} now contains {passed_checks_count} valid image entries.")

    print("\n--- Final Dataset Summary ---")
    final_base_count = sum(1 for row in filtered_data[1:] if row[4] == "base")
    final_secondary_count = sum(1 for row in filtered_data[1:] if row[4] == "secondary")
    print(f"Total valid images in dataset: {passed_checks_count}")
    print(f"  - Base images: {final_base_count}")
    print(f"  - Secondary images: {final_secondary_count}")
    print(f"Dataset images stored in: {BASE_IMAGES_FOLDER} and {SECONDARY_IMAGES_FOLDER}")
    print(f"Dataset metadata stored in: {METADATA_FILE}")
    print("="*80)

if __name__ == "__main__":
    main()

OpenCV (cv2) is installed. Quality checks will run.
Requests and BeautifulSoup4 are installed. Web scraping enabled.
Metadata file already exists: ai_marketing_test_dataset/image_metadata.csv. Processing will add new entries.
AI Marketing Image Dataset Building Process with DuckDuckGo
This script uses DuckDuckGo to search for images on Unsplash, Pexels, and Pixabay.
Images will be saved to ai_marketing_test_dataset/base_images (base) and ai_marketing_test_dataset/secondary_images (secondary).
Metadata will be updated in ai_marketing_test_dataset/image_metadata.csv.
After scraping, you can manually add more images by searching on these websites.
Manual images should be saved to the appropriate folders and added to ai_marketing_test_dataset/image_metadata.csv.
Quality checks will validate all images after scraping and manual additions.

--- Base Image Queries ---
Will save to: ai_marketing_test_dataset/base_images
- Query: 'fashion photoshoot background outdoor city' (Industry: fashion, 

UnboundLocalError: cannot access local variable 'search_and_scrape_images' where it is not associated with a value