<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/Product_Marketing_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Sample Data for AI Product Marketing System for Image Generation
# Product Marketing AI System

**Overview:**

The Product Marketing AI System automatically creates high-quality marketing visuals by processing, enhancing, and refining input images.

**Key Steps:**
- **Input & Preprocessing:**  
  Users provide a base image (e.g., a background or scene) and a secondary image (e.g., a product or model). The system enhances these images, adjusts resolution and contrast, and generates segmentation masks.

- **Context & Prompt Generation:**  
  It analyzes the images to extract contextual details and automatically generates a detailed, structured marketing prompt tailored to the product and target audience.

- **Image Generation & Refinement:**  
  Advanced AI models use the prompt to generate photorealistic marketing images. The system then refines the results based on user feedback to ensure the output meets professional quality standards.

- **Quality Evaluation:**  
  Final images are evaluated using metrics (such as SSIM, PSNR, and color histograms) to compare them with the original inputs. Additionally, AI-driven feedback helps further improve image quality.

**Applications:**
- **Digital Advertising & E-Commerce:**  
  Generate stunning visuals for online stores, social media campaigns, and ad banners.
- **Branding & Marketing:**  
  Enhance and standardize visual branding materials across various industries such as fashion, tech, automotive, and real estate.
- **Content Creation:**  
  Streamline production of professional digital content for websites, promotions, and digital signage.

This system provides a seamless, automated solution for creating visually appealing marketing assets, reducing manual effort and ensuring consistency and high quality across all outputs.

In [1]:
!pip install serpapi google-search-results



In [7]:
import os
import requests
import json
import uuid
import csv
import time
import re
from urllib.parse import urljoin, urlparse
from bs4 import BeautifulSoup
from PIL import Image
import numpy as np
import cv2
from serpapi import GoogleSearch

BASE_DIR = "ai_marketing_test_dataset"
BASE_IMAGES_FOLDER = os.path.join(BASE_DIR, "base_images")
SECONDARY_IMAGES_FOLDER = os.path.join(BASE_DIR, "secondary_images")
METADATA_FILE = os.path.join(BASE_DIR, "image_metadata.csv")
os.makedirs(BASE_IMAGES_FOLDER, exist_ok=True)
os.makedirs(SECONDARY_IMAGES_FOLDER, exist_ok=True)

SERPAPI_KEY = "95c2797a69b167639c98ab054e8597d752c6fe6d"
IMAGES_PER_SOURCE_QUERY = 20  # Increased to download more images per query
MIN_DIMENSION = 600
MIN_VARIANCE = 50
MIN_LAPLACIAN = 25
SEARCH_DELAY_SECONDS = 4

In [8]:
INDUSTRY_TEMPLATES = {
    "fashion": {
        "base_queries": [
            "fashion photoshoot background outdoor city",
            "luxury retail store interior clean",
            "urban street fashion backdrop bokeh",
            "minimalist white studio background fashion",
            "runway show background lights",
            "vintage boutique interior",
            "beach fashion shoot background",
            "mountain landscape fashion backdrop"
        ],
        "secondary_queries": [
            "female fashion model standing studio",
            "male fashion model walking urban",
            "diverse group of fashion models laughing",
            "plus size fashion model portrait",
            "model wearing casual outfit outdoor",
            "model wearing formal dress red carpet",
            "model wearing activewear gym",
            "model close up face beauty shot",
            "model full body shot isolated",
            "designer dress on hanger studio",
            "luxury handbag closeup on table",
            "fashion accessories collection white background",
            "luxury watch on wrist close up detail",
            "elegant necklace on display stand",
            "stylish mens shirt studio shot",
            "fashion trousers folded white background",
            "denim jeans stack photography",
            "pair of sneakers clean background",
            "high heels studio shot",
            "winter coat on mannequin",
            "swimsuit on model beach",
            "sunglasses on face close up",
            "scarf draped elegant",
            "leather wallet photography"
        ]
    },
    "food": {
        "base_queries": [
            "gourmet kitchen countertop high angle marble",
            "fine dining restaurant table setting",
            "rustic wooden table background bokeh",
            "outdoor cafe patio sunny",
            "bakery display case vibrant",
            "farmers market produce display",
            "cozy home kitchen window light"
        ],
        "secondary_queries": [
            "plated gourmet dish close up steak",
            "appetizing dessert photography chocolate cake",
            "fresh fruit and vegetable colorful arrangement",
            "refreshing cocktail condensation glass bar",
            "coffee cup steam close up",
            "artisanal bread on cutting board",
            "soup bowl overhead shot",
            "pizza slice melted cheese close up",
            "sushi platter arrangement",
            "bottle of wine with glass",
            "ice cream scoop texture",
            "raw ingredients selection herbs spices"
        ]
    },
    "tech": {
        "base_queries": [
            "modern office desk setup laptop monitor",
            "minimalist white workspace clean",
            "futuristic tech lab abstract",
            "urban technology street view night",
            "server room blue light",
            "gaming setup neon lights",
            "smart home interior blurred background"
        ],
        "secondary_queries": [
            "smartphone screen close up hand holding",
            "laptop on modern desk professional setting",
            "smartwatch on wrist active",
            "wireless earbuds charging case open",
            "mechanical keyboard close up",
            "gaming mouse and pad",
            "tablet device in hands using stylus",
            "portable speaker outdoor nature",
            "noise cancelling headphones photography",
            "dslr camera lens detail",
            "drone flying over landscape",
            "smart home speaker on counter",
            "external hard drive photography"
        ]
    },
    "beauty": {
        "base_queries": [
            "luxury bathroom vanity clean marble",
            "serene spa setting with candles plants",
            "natural light portrait background soft focus",
            "elegant dressing table makeup setup",
            "cosmetic store aisle blurred",
            "tropical resort spa background"
        ],
        "secondary_queries": [
            "model with clear skin glowing portrait",
            "makeup palette brushes flat lay",
            "perfume bottle luxury gold",
            "skincare product texture cream swatch",
            "hair product shampoo bottle studio",
            "model applying lipstick mirror",
            "model demonstrating face serum application",
            "nail polish bottles colorful",
            "cosmetic brush set clean background",
            "bar of soap natural ingredients",
            "essential oil bottle with plants"
        ]
    },
    "automotive": {
        "base_queries": [
            "scenic coastal road sunset drive",
            "city parking lot night lights",
            "clean automotive showroom floor",
            "empty desert road daytime",
            "snowy mountain road empty",
            "industrial warehouse interior"
        ],
        "secondary_queries": [
            "luxury sedan studio shot low angle",
            "sports car dynamic action shot race track",
            "rugged SUV off-road trail",
            "car interior dashboard detail modern",
            "motorcycle parked city street",
            "electric car charging station futuristic",
            "vintage car restored sunny day",
            "car wheel rim close up",
            "headlights detail night",
            "convertible car open top"
        ]
    },
    "real_estate": {
        "base_queries": [
            "modern living room interior spacious light",
            "luxury kitchen interior island",
            "house exterior sunny day landscaped yard",
            "apartment balcony view cityscape high rise",
            "backyard garden landscape patio",
            "bedroom interior cozy minimalist",
            "modern bathroom interior clean",
            "home office setup window view",
            "building facade modern architecture",
            "rooftop terrace city view"
        ],
        "secondary_queries": [
            "luxury sofa in living room",
            "decorative abstract art on wall",
            "modern kitchen appliance set stainless steel",
            "happy family relaxing on sofa",
            "person enjoying balcony view coffee",
            "dining table setup elegant",
            "comfortable armchair reading nook",
            "exterior door entryway welcoming",
            "swimming pool luxury backyard",
            "patio furniture setup"
        ]
    },
    "travel": {
        "base_queries": [
            "tropical beach paradise sunset",
            "mountain landscape panorama",
            "historic city street europe",
            "luxury hotel lobby elegant",
            "airplane window view clouds",
            "cruise ship deck ocean",
            "safari jeep african savannah",
            "northern lights night sky"
        ],
        "secondary_queries": [
            "traveler with backpack hiking trail",
            "couple holding hands beach sunset",
            "family vacation photo happy",
            "suitcase and passport travel",
            "airplane wing view clouds",
            "hotel room modern",
            "travel guidebook and map",
            "tourist taking photo landmark",
            "adventure gear camping",
            "local cuisine street food market"
        ]
    },
    "health": {
        "base_queries": [
            "modern hospital lobby clean",
            "doctor office examination room",
            "fitness center gym equipment",
            "yoga studio calm",
            "pharmacy interior shelves",
            "medical lab research",
            "physical therapy clinic",
            "dental office waiting room"
        ],
        "secondary_queries": [
            "doctor in white coat stethoscope",
            "nurse caring for patient",
            "fitness trainer helping client",
            "yoga instructor demonstrating pose",
            "pharmacist behind counter",
            "medical researcher in lab",
            "physical therapist assisting patient",
            "dentist examining patient",
            "healthy meal prep ingredients",
            "exercise equipment home gym"
        ]
    },
    "education": {
        "base_queries": [
            "modern classroom whiteboard",
            "university lecture hall students",
            "school library bookshelves",
            "science lab experiment",
            "art classroom creative",
            "school playground children playing",
            "online learning virtual classroom",
            "study group students notebooks"
        ],
        "secondary_queries": [
            "teacher explaining lesson",
            "student studying book",
            "graduation ceremony cap and gown",
            "science experiment equipment",
            "art supplies paintbrushes",
            "children playing on playground",
            "laptop online course",
            "study notes and textbooks",
            "school bus yellow",
            "university campus buildings"
        ]
    }
}

GENERAL_QUERIES = {
    "base_queries": [
        "clean white seamless background high resolution studio",
        "gradient studio background soft colors pastel",
        "abstract geometric background modern",
        "blurred bokeh light background warm",
        "minimalist empty room background concrete",
        "simple colored paper background",
        "texture background concrete wall",
        "wooden surface background tabletop"
    ],
    "secondary_queries": [
        "isolated product on transparent background mockup",
        "person full body pose white background standing",
        "still life objects arrangement clean background",
        "single item close up studio white background",
        "cutout person on white background",
        "isolated electronics white background",
        "isolated food item white background",
        "isolated fashion item white background",
        "isolated accessory white background",
        "group of diverse people white background casual"
    ]
}

In [None]:
def init_metadata():
    if not os.path.exists(METADATA_FILE):
        with open(METADATA_FILE, "w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(["image_path", "category", "source_url", "industry", "type", "source_query"])
        print(f"Initialized metadata file: {METADATA_FILE}")
    else:
        print(f"Metadata file already exists: {METADATA_FILE}. Appending data.")

def is_valid(url):
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)

def extract_images_from_html(html, base_url):
    soup = BeautifulSoup(html, "html.parser")
    image_urls = []
    for img in soup.find_all("img"):
        img_url = img.get("src")
        if not img_url:
            continue
        img_url = urljoin(base_url, img_url)
        if is_valid(img_url) and any(img_url.lower().endswith(ext) for ext in [".jpg", ".jpeg", ".png"]):
            image_urls.append(img_url)
    return image_urls

def perform_quality_check(image_path):
    try:
        with Image.open(image_path) as img:
            if img.size[0] < MIN_DIMENSION or img.size[1] < MIN_DIMENSION:
                print(f"Removed {image_path}: Below min dimensions ({img.size[0]}x{img.size[1]} < {MIN_DIMENSION}x{MIN_DIMENSION})")
                return False
            img_cv_gray = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
            if img_cv_gray is None:
                print(f"Removed {image_path}: Could not read image with OpenCV (corrupt or invalid).")
                return False
            if img_cv_gray.shape[0] < 3 or img_cv_gray.shape[1] < 3 or np.prod(img_cv_gray.shape) < 100:
                print(f"Removed {image_path}: Image too small or dimensions invalid for checks ({img_cv_gray.shape[1]}x{img_cv_gray.shape[0]})")
                return False
            variance = np.var(img_cv_gray)
            laplacian = cv2.Laplacian(img_cv_gray, cv2.CV_64F).var()
            if variance < MIN_VARIANCE or laplacian < MIN_LAPLACIAN:
                print(f"Removed {image_path}: Low quality (variance={variance:.2f} < {MIN_VARIANCE} or Laplacian={laplacian:.2f} < {MIN_LAPLACIAN})")
                return False
            return True
    except Exception as e:
        print(f"Error during quality check for {image_path}: {e}")
        return False

def download_image(url, folder, category, industry, image_type, source_query):
    if not is_valid(url):
        return None
    if re.search(r'\.(svg|gif)$', url.lower()) or "base64" in url.lower() or url.endswith('/') or url.endswith('#') or 'placeholder' in url.lower():
        return None
    save_path = None
    try:
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}
        response = requests.get(url, stream=True, timeout=25, headers=headers)
        response.raise_for_status()
        content_type = response.headers.get('content-type', '').lower()
        ext = ".jpg"
        if 'jpeg' in content_type:
            ext = ".jpg"
        elif 'png' in content_type:
            ext = ".png"
        elif 'gif' in content_type or 'svg' in content_type:
            print(f"Skipping GIF/SVG: {url}")
            return None
        elif 'image' in content_type:
            url_path = urlparse(url).path
            url_ext = os.path.splitext(url_path)[1].lower()
            if url_ext in ['.jpg', '.jpeg', '.png']:
                ext = url_ext
            else:
                if 'jpeg' in content_type: ext = ".jpg"
                elif 'png' in content_type: ext = ".png"
                else:
                    print(f"Skipping unknown generic image type based on URL ext '{url_ext}' and header '{content_type}': {url}")
                    return None
        else:
            print(f"Skipping non-image content type '{content_type}': {url}")
            return None
        image_name = f"{uuid.uuid4()}{ext}"
        save_path = os.path.join(folder, image_name)
        with open(save_path, "wb") as f:
            downloaded_size = 0
            for chunk in response.iter_content(8192):
                f.write(chunk)
                downloaded_size += len(chunk)
                if downloaded_size > 10 * 1024 * 1024:
                    print(f"Removed {save_path}: Exceeded 10MB size limit.")
                    response.close()
                    os.remove(save_path)
                    return None
        if perform_quality_check(save_path):
            with open(METADATA_FILE, "a", newline="", encoding="utf-8") as f:
                writer = csv.writer(f)
                writer.writerow([save_path, category, url, industry, image_type, source_query])
            return save_path
        else:
            if os.path.exists(save_path):
                os.remove(save_path)
            return None
    except requests.exceptions.RequestException as e:
        if save_path and os.path.exists(save_path):
            try: os.remove(save_path)
            except OSError: pass
        return None
    except Exception as e:
        print(f"An unexpected error occurred with {url}: {e}")
        if save_path and os.path.exists(save_path):
            try: os.remove(save_path)
            except OSError: pass
        return None

def google_image_search(query, limit=IMAGES_PER_SOURCE_QUERY):
    if not SERPAPI_KEY or SERPAPI_KEY == "YOUR_SERPAPI_KEY":
        print("SerpAPI key not configured. Skipping Google Image Search.")
        return []
    print(f"Searching Google Images for: {query} (limit {limit})")
    try:
        params = {
            "q": query,
            "tbm": "isch",
            "ijn": "0",
            "api_key": SERPAPI_KEY,
            "safe": "active",
            "num": limit
        }
        search = GoogleSearch(params)
        results = search.get_dict()
        image_urls = []
        for img in results.get("images_results", []):
            url = img.get("original")
            if url and is_valid(url):
                image_urls.append(url)
        print(f"Found {len(image_urls)} potential results for '{query}'. Attempting to download...")
        return image_urls
    except Exception as e:
        print(f"Error fetching Google Images for '{query}': {e}")
        error = str(e).lower()
        if "invalid api key" in error:
            print("SERPAPI_KEY seems invalid. Please check your key.")
        elif "daily limit exceeded" in error or "account has insufficient funds" in error:
            print("SERPAPI daily limit exceeded or other account issue. Please check your SerpAPI account.")
        elif "quota" in error:
            print("SerpAPI quota issue.")
        else:
            print(f"Unknown SerpAPI error: {e}")
        return []

def create_category_name(query, industry, image_type):
    words = query.lower().split()
    meaningful_words = [word for word in words if len(word) > 2 and word not in ['a', 'an', 'the', 'in', 'on', 'with', 'for', 'and', 'high', 'low', 'up', 'down', 'clean', 'white', 'black', 'red', 'blue']]
    base_name = "_".join(meaningful_words[:4])
    if not base_name:
        base_name = "misc"
    base_name = re.sub(r'[^a-z0-9_]', '', base_name)
    base_name = re.sub(r'_{2,}', '_', base_name).strip('_')
    return f"{industry}_{image_type}_{base_name}"

def main():
    init_metadata()
    image_sources_list = []
    for industry, data in INDUSTRY_TEMPLATES.items():
        for query in data.get("base_queries", []):
            category = create_category_name(query, industry, "base")
            image_sources_list.append(("google", query, category, industry, "base"))
        for query in data.get("secondary_queries", []):
            category = create_category_name(query, industry, "secondary")
            image_sources_list.append(("google", query, category, industry, "secondary"))
    for query in GENERAL_QUERIES.get("base_queries", []):
        category = create_category_name(query, "general", "base")
        image_sources_list.append(("google", query, category, "various", "base"))
    for query in GENERAL_QUERIES.get("secondary_queries", []):
        category = create_category_name(query, "general", "secondary")
        image_sources_list.append(("google", query, category, "various", "secondary"))
    all_downloaded_paths = {"base": [], "secondary": []}
    print(f"Starting image download process. Targeting approx. {IMAGES_PER_SOURCE_QUERY} valid images per source query.")
    print(f"Defined {len(image_sources_list)} distinct source queries.")
    for source_type, source_query, category, industry, image_type in image_sources_list:
        print(f"\nProcessing {image_type} source for '{industry}' category '{category}': '{source_query}'")
        folder = BASE_IMAGES_FOLDER if image_type == "base" else SECONDARY_IMAGES_FOLDER
        downloaded_count_for_source = 0
        if source_type == "google":
            image_urls = google_image_search(source_query, limit=IMAGES_PER_SOURCE_QUERY)
            if not image_urls:
                print(f"No image URLs found or error for query: '{source_query}'")
                time.sleep(SEARCH_DELAY_SECONDS / 2)
                continue
            urls_to_download = image_urls[:IMAGES_PER_SOURCE_QUERY * 2]
            for url in urls_to_download:
                if downloaded_count_for_source >= IMAGES_PER_SOURCE_QUERY:
                    print(f"Reached target count ({IMAGES_PER_SOURCE_QUERY}) for query '{source_query}'. Stopping downloads for this query.")
                    break
                path = download_image(url, folder, category, industry, image_type, source_query)
                if path:
                    all_downloaded_paths[image_type].append(path)
                    downloaded_count_for_source += 1
                time.sleep(0.4)
            print(f"Finished processing query '{source_query}'. Downloaded {downloaded_count_for_source} valid images.")
        time.sleep(SEARCH_DELAY_SECONDS)
    summary = {
        "base_images_downloaded": len(all_downloaded_paths["base"]),
        "secondary_images_downloaded": len(all_downloaded_paths["secondary"]),
        "total_images_downloaded": len(all_downloaded_paths["base"]) + len(all_downloaded_paths["secondary"]),
        "base_image_paths": all_downloaded_paths["base"],
        "secondary_image_paths": all_downloaded_paths["secondary"]
    }
    summary_file_path = os.path.join(BASE_DIR, "download_summary.json")
    with open(summary_file_path, "w", encoding="utf-8") as f:
        json.dump(summary, f, indent=4)
    print("\n--- Dataset Building Complete ---")
    print(f"Summary saved to: {summary_file_path}")
    print(json.dumps(summary, indent=4))
    print(f"Detailed metadata saved to: {METADATA_FILE}")
    print(f"Images saved to: {BASE_IMAGES_FOLDER} and {SECONDARY_IMAGES_FOLDER}")

if __name__ == "__main__":
    main()

Initialized metadata file: ai_marketing_test_dataset/image_metadata.csv
Starting image download process. Targeting approx. 20 valid images per source query.
Defined 196 distinct source queries.

Processing base source for 'fashion' category 'fashion_base_fashion_photoshoot_background_outdoor': 'fashion photoshoot background outdoor city'
Searching Google Images for: fashion photoshoot background outdoor city (limit 20)
Found 0 potential results for 'fashion photoshoot background outdoor city'. Attempting to download...
No image URLs found or error for query: 'fashion photoshoot background outdoor city'

Processing base source for 'fashion' category 'fashion_base_luxury_retail_store_interior': 'luxury retail store interior clean'
Searching Google Images for: luxury retail store interior clean (limit 20)
Found 0 potential results for 'luxury retail store interior clean'. Attempting to download...
No image URLs found or error for query: 'luxury retail store interior clean'

Processing bas