# VICTGOAL Bike Helmet: Review Analysis & Image Generation

This notebook consolidates the analysis for the Final Project, covering:
1.  **Q2:** Analyzing customer reviews using Embeddings, Clustering, and LLM Feature Extraction.
2.  **Q3:** Generating product images using Stable Diffusion based on the extracted insights.

## Part 1: Setup & Data Loading

In [None]:
import pandas as pd
import numpy as np
import json
import os
import matplotlib.pyplot as plt
import seaborn as sns
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import faiss
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

# Load Reviews
df = pd.read_csv('reviews.csv')
print(f"Loaded {len(df)} reviews")

# Clean Data
df['review_text'] = df['tl-m'].fillna('')
reviews = df['review_text'].tolist()
print(f"Processed {len(reviews)} valid reviews")

## Part 2: Embedding & Clustering

In [None]:
# Generate Embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = model.encode(reviews)
print("Embeddings generated shape:", embeddings.shape)

# K-Means Clustering
num_clusters = 8
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
cluster_labels = kmeans.fit_predict(embeddings)
df['cluster'] = cluster_labels

# Visualize Distribution
plt.figure(figsize=(10, 6))
sns.countplot(x='cluster', data=df, palette='viridis')
plt.title('Review Cluster Distribution')
plt.xlabel('Cluster ID')
plt.ylabel('Number of Reviews')
plt.show()

## Part 3: LLM Feature Extraction (RAG)
We use FAISS to retrieve the most relevant reviews and feed them to GPT-4o to extract visual features.

In [None]:
# Build FAISS Index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings.astype('float32'))

# Search for visual details
query = "visual appearance design materials aesthetic look style"
query_vector = model.encode([query])
k = 500
D, I = index.search(query_vector.astype('float32'), k)
relevant_reviews = [reviews[i] for i in I[0]]

# Construct Prompt for GPT-4o
analysis_prompt = f"""
Analyze the following 500 reviews for the VICTGOAL Bike Helmet.
Extract the following details in JSON format:
1. Design (Shape, Style, Vibe)
2. Materials (Shell, Liner, Finish)
3. Key Features (Visor, Goggles, Light)
4. Color Scheme
5. A detailed IMAGE PROMPT for a diffusion model.

Reviews:
{relevant_reviews[:10]} ... (truncated for brevity)
"""

print("Prompt constructed. (Skipping actual API call in notebook to save credits, loading cached result...)")

# Load Cached Features
if os.path.exists('helmet_extracted_features.json'):
    with open('helmet_extracted_features.json', 'r') as f:
        features = json.load(f)
    print(json.dumps(features, indent=2))
else:
    print("Cached features not found.")

## Part 4: Image Generation (Local)
Using Stable Diffusion v1.5 and OpenJourney to visualize the extracted prompts.

In [None]:
import torch
from diffusers import StableDiffusionPipeline
from IPython.display import Image, display

# Prompts
prompts = [
    {"id": "p1_reviews", "desc": "Review-Based", "text": "...futuristic design with magnetic goggles..."},
    {"id": "p2_specs", "desc": "Specs-Based", "text": "...dual-tone fluorescent yellow and black..."},
    {"id": "p3_action", "desc": "Action Shot", "text": "...cinematic action shot..."}
]

# Code to generate (Commented out to prevent auto-run)
"""
model_id = "prompthero/openjourney"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
pipe = pipe.to("mps")

for p in prompts:
    image = pipe(p['text']).images[0]
    image.save(f"q3_generated_images/model2_openjourney_{p['id']}.png")
"""

print("Images generated locally. Displaying results below:")

### Generated Images Results

In [None]:
# Display Images
import glob
images = glob.glob("q3_generated_images/*.png")
for img_path in sorted(images):
    print(f"Displaying: {img_path}")
    display(Image(filename=img_path, width=400))