## VAESTA Project: Personalized Garment Dataset Creation

This notebook uses the Gemini Vision model to analyze user-provided images of single clothing items, extract visual attributes, and compute derived features (like warmth, comfort, and layering scores) to create a rich, structured JSON dataset.

0. Initial Setup

Install Dependencies (Run this cell once)

In [None]:
!pip install pillow google-generativeai tqdm pandas --quiet

Imports and Configuration

In [14]:
import os
import json
import re
from pathlib import Path
from PIL import Image
from tqdm import tqdm
import google.generativeai as genai

# --- Configuration ---
GEMINI_API_KEY = "AIzaSyAKL82W14BBO8A2mObYj7UB6AxI9FY6LGs"
IMAGE_DIR = Path("simulated_wardrobes/Female_Wardrobe/")
# IMAGE_DIR = Path("simulated_wardrobes/Male_Wardrobe")
OUTPUT_FILE = Path("personalized_clothing_dataset.json")

# Ensure image directory exists
if not IMAGE_DIR.is_dir():
    print(f"Creating image directory at {IMAGE_DIR}. Please add your images now.")
    IMAGE_DIR.mkdir(exist_ok=True)

# --- Configure Gemini ---
try:
    genai.configure(api_key=GEMINI_API_KEY)
    model = genai.GenerativeModel("gemini-2.5-flash-lite")
    print("Gemini Model Configured (gemini-2.5-flash-lite)")
except Exception as e:
    print(f"Error configuring Gemini: {e}")
    # Exit or handle error if API key is invalid

Gemini Model Configured (gemini-2.5-flash-lite)


1. Feature Engineering Logic

These functions translate the raw visual attributes extracted by Gemini (Category, Material, Pattern) into the required quantitative scores (Warmth, Impermeability, Comfort, and Layering).

In [15]:
# Mapping to determine basic garment type
OUTER_GARMENTS = ["coat", "jacket", "cardigan", "blazer", "hoodie"]

def determine_outer_inner(category):
    """Classifies a garment as 'outer' or 'inner' based on category."""
    category = category.lower()
    if category in OUTER_GARMENTS:
        return "outer"
    elif category in ["dress", "skirt", "pants", "shorts", "shoes", "accessory"]:
        return "not-applicable" # E.g., not an upper body layer
    return "inner" # Default for t-shirts, shirts, sweaters, etc.

def compute_warmth_score(material, category):
    """Calculates a warmth score (1-5) based on material and garment type."""
    fabric_scores = {"denim":3,"cotton":2,"leather":4,"furry":5,"wool":5,"knit":4,"chiffon":1,"synthetic":3,"silk":2,"linen":1,"other":2}
    
    base_score = fabric_scores.get(material.lower(), 2)
    
    if category.lower() in OUTER_GARMENTS:
        base_score += 2 # Outer garments typically add more warmth
    elif category.lower() == "dress":
        base_score += 1 # Dresses cover a large area
        
    return min(max(1, base_score), 5) # Scale to 1-5

def compute_impermeability_score(material):
    """Calculates an impermeability score (1-3)."""
    material = material.lower()
    if material in ["leather", "synthetic"]: 
        return 3
    if material in ["denim"]:
        return 2
    return 1

def compute_comfort_score(material, pattern):
    """Calculates a comfort score (1-5)."""
    score = 0
    # Material comfort
    material = material.lower()
    if material in ["cotton", "knit", "silk"]: score += 2
    elif material in ["leather", "denim"]: score += 1
    
    # Pattern/Style Comfort (solid/no pattern is often more casual/comfortable)
    if pattern.lower() in ["pure color", "none"]: score += 1
    
    return min(max(1, score), 5) # Scale to 1-5

def compute_layering_score(garment_type):
    """Calculates a layering score (1-5) based on how easily it can be layered."""
    if garment_type == "outer":
        return 5 # Designed to be worn over, high layering potential
    if garment_type == "inner":
        return 4 # Designed to be worn under, good layering potential
    return 2 # Not a traditional layer (e.g., pants, shoes)

2. Gemini Vision Prompt

This prompt is crucial. It instructs the model to act as a clothing expert, analyze the image, and return a clean, structured JSON object containing all necessary visual attributes.

In [16]:
PROMPT = """
You are analyzing a photo of a *single* clothing garment.

TASK 1: Extract the garment's visual attributes and shape details.
TASK 2: Return ALL details in a single, valid JSON object.

Rules:
- Assume the image contains only ONE primary garment.
- For shape, focus on the garment itself (e.g., sleeve type, fit).
- If an attribute is not clearly visible or applicable (e.g., 'sleeve' on pants), use "none".

Return ONLY valid JSON:

{
  "category": "t-shirt | button-up shirt | sweater | coat | jacket | jeans | trousers | shorts | skirt | dress | shoes | accessory",
  "material": "cotton | denim | leather | synthetic (nylon/polyester) | wool | knit | silk | linen | other | none",
  "color": "dominant color name or pattern (e.g., 'light blue', 'red and white')",
  "pattern": "pure color (solid) | floral | graphic (logo/text) | striped | plaid | none",
  "shape_details": {
    "sleeve": "long-sleeve | short-sleeve | sleeveless | none",
    "neckline": "crew-neck | v-neck | collar | hoodie | none",
    "fit": "slim | regular | oversized | tailored | none"
  },
  "notes": "short sentence describing the garment, e.g., 'A thick, oversized wool sweater.'"
}
"""

3. Image Processing Loop

This cell iterates through all images in the my_clothing_images/ folder, calls the Gemini model, and applies the feature engineering logic to compile the final dataset.

In [17]:
import time

final_results = []
all_images = list(IMAGE_DIR.glob("*.jpg"))

if not all_images:
    print(f"No images found in {IMAGE_DIR}. Please check the folder.")

print(f"Found images to analyze: {len(all_images)}")

for img_path in tqdm(all_images, desc="Analyzing images"):
    image_link = str(img_path)
    
    try:
        img = Image.open(img_path)

        # 1. Send to Gemini Vision
        response = model.generate_content([PROMPT, img])
        text = response.text
        
        # 2. Extract JSON
        match = re.search(r"{.*}", text, re.DOTALL)
        if not match:
            print(f"Skipping {img_path.name}: Failed to extract JSON.")
            continue
            
        gemini_data = json.loads(match.group())

        # 3. Compute derived features
        category = gemini_data.get("category", "unknown")
        material = gemini_data.get("material", "none")
        
        garment_type = determine_outer_inner(category)
        warmth_score = compute_warmth_score(material, category)
        impermeability_score = compute_impermeability_score(material)
        comfort_score = compute_comfort_score(material, gemini_data.get("pattern", "none"))
        layering_score = compute_layering_score(garment_type)
        
        # 4. Build final dataset row
        final_row = {
            "image_link": image_link,
            "category": category,
            "outer_inner": garment_type,
            "shape": gemini_data.get("shape_details", {}),
            "material": material,
            "color": gemini_data.get("color", "unknown"),
            "pattern": gemini_data.get("pattern", "none"),
            "warmth_score": warmth_score,
            "layering_score": layering_score,
            "impermeability_score": impermeability_score,
            "comfort_score": comfort_score,
            "notes": gemini_data.get("notes", "")
        }

        final_results.append(final_row)
        time.sleep(10)  # To respect rate limits

    except Exception as e:
        print(f"Error processing {img_path.name}: {e}")
        continue

print(f"\n Finished processing! Garment records created: {len(final_results)}")

Found images to analyze: 26


Analyzing images:  81%|████████  | 21/26 [04:44<00:50, 10.09s/it]

Error processing Bottoms_01_Celana_Panjang_belstaff_england_paint_splatte_1694232947_1b8f2440_progressive_thumbnail.jpg: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 14.064960076s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, 

Analyzing images:  85%|████████▍ | 22/26 [04:44<00:28,  7.13s/it]

Error processing Tops_08_Kaos_8d3ec813-014f-4a8b-ab08-bc21dfb9823e.jpg: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 13.818671439s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 13
}
]


Analyzing images:  88%|████████▊ | 23/26 [04:45<00:15,  5.07s/it]

Error processing Tops_06_Kaos_8eb3e51e-df75-43a5-be62-08e635a20edc.jpg: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 13.570273992s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 13
}
]


Analyzing images:  96%|█████████▌| 25/26 [04:45<00:02,  2.58s/it]

Error processing Tops_02_Polo_kaos_polo_kerah_lengan_panjang_1688219770_7574d94a_progressive_thumbnail.jpg: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 13.36456015s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {


Analyzing images: 100%|██████████| 26/26 [04:45<00:00, 10.99s/it]

Error processing Tops_03_Kemeja_jual_kemeja_casual_salur_comme_1678076770_ca129136_progressive_thumbnail.jpg: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/usage?tab=rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash-lite
Please retry in 12.962835099s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash-lite"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay




4. Save and Review Dataset

Finally, the results are saved to a JSON file and the first record is printed for a quick check.

In [18]:
# --- SAVE DATASET AS JSON ---
if final_results:
    with open(OUTPUT_FILE, 'w') as f:
        json.dump(final_results, f, indent=4)

    print(f"Dataset successfully saved to {OUTPUT_FILE}")

    # Display a sample of the results
    print("\n--- Sample Record ---")
    print(json.dumps(final_results[0], indent=4))
else:
    print("Dataset not created as no images were processed successfully.")

Dataset successfully saved to personalized_clothing_dataset.json

--- Sample Record ---
{
    "image_link": "simulated_wardrobes/Female_Wardrobe/Tops_04_Kaos_2e661f7d-0f22-4263-9ce7-de94ee5f5a7c.jpg",
    "category": "t-shirt",
    "outer_inner": "inner",
    "shape": {
        "sleeve": "short-sleeve",
        "neckline": "crew-neck",
        "fit": "regular"
    },
    "material": "cotton",
    "color": "green",
    "pattern": "graphic (logo/text)",
    "warmth_score": 2,
    "layering_score": 4,
    "impermeability_score": 1,
    "comfort_score": 2,
    "notes": "A green t-shirt with a graphic print of a shark and text."
}
