# üé® Style-Based Recommendation System
## Training & Inference Pipeline

---

### üìã Project Overview

This notebook implements an end-to-end **customer style-based recommendation system** that:

1. **Extracts visual features** from product images using MobileNetV2
2. **Builds customer style profiles** from interaction history (weighted embeddings)
3. **Generates personalized recommendations** based on visual similarity
4. **Evaluates performance** using offline metrics

---

### üéØ Key Concepts

**Customer Style Profile**: A weighted average of image embeddings from products the customer has interacted with. Products they purchased get higher weight than products they only viewed.

**Recommendation Strategy**: Find products whose image embeddings are most similar (cosine similarity) to the customer's style profile.

---

### üìä Expected Outputs

- **Product embeddings** (1280-dim vectors) for entire catalog
- **Customer style profiles** (1280-dim vectors) for active customers
- **Top-N recommendations** for each customer
- **Performance metrics**: Precision@K, Recall@K, Coverage

---

### üóÇÔ∏è Dataset Requirements

- `sample_customers.csv` - Customer profiles
- `sample_interactions.csv` - Browsing/purchase history
- `data/collection_images/` - Product images
- Trained MobileNetV2 model (or use pretrained)

In [None]:
# ============================================================
# CONFIGURATION
# ============================================================

# Mode Selection
TEST_MODE = False  # Set to True for quick testing with limited data

# Paths
DATA_DIR = "/project/data"
IMAGE_DIR = "/project/data/collection_images"
MODEL_DIR = "/project/models"
OUTPUT_DIR = "/project/outputs"

# Model Settings
IMG_SIZE = 224
EMBEDDING_DIM = 1280  # MobileNetV2 output dimension

# Recommendation Settings
MIN_INTERACTIONS = 3  # Minimum interactions to build customer profile
RECENCY_HALF_LIFE_DAYS = 30  # Weight decay for older interactions
TOP_N_RECOMMENDATIONS = 10  # Number of recommendations to generate

# Interaction Weights (how much each interaction type contributes to style profile)
INTERACTION_WEIGHTS = {
    'purchase': 10.0,         # Strongest signal
    'add_to_wishlist': 5.0,   # Strong intent
    'add_to_cart': 3.0,       # Medium intent
    'view': 1.0,              # Weak signal
    'click': 0.5              # Weakest signal
}

# Test Mode Settings
if TEST_MODE:
    MAX_PRODUCTS = 50  # Limit products for testing
    MAX_CUSTOMERS = 10  # Limit customers for testing
    print("üß™ TEST MODE: Using limited data for quick validation")
else:
    MAX_PRODUCTS = None  # Use all products
    MAX_CUSTOMERS = None  # Use all customers
    print("üéØ PRODUCTION MODE: Using full dataset")

print(f"\n{'='*60}")
print(f"  Image Size: {IMG_SIZE}x{IMG_SIZE}")
print(f"  Embedding Dimension: {EMBEDDING_DIM}")
print(f"  Min Interactions: {MIN_INTERACTIONS}")
print(f"  Recency Half-Life: {RECENCY_HALF_LIFE_DAYS} days")
print(f"  Top-N Recommendations: {TOP_N_RECOMMENDATIONS}")
print(f"{'='*60}")

In [None]:
# ============================================================
# IMPORTS
# ============================================================

import os
import sys
from pathlib import Path
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from tqdm import tqdm
import pickle
import json

# Deep Learning
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import models
from PIL import Image

# Similarity & Metrics
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Custom modules
sys.path.append('/project/code')
from customer_style_profiler import CustomerStyleProfiler

# Setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"‚úì Using device: {device}")

# Create output directory
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
print(f"‚úì Output directory: {OUTPUT_DIR}")

In [None]:
# ============================================================
# LOAD DATA
# ============================================================

print("üìä Loading datasets...")

# Load customer and interaction data
customers = pd.read_csv(f"{DATA_DIR}/sample_customers.csv")
interactions = pd.read_csv(f"{DATA_DIR}/sample_interactions.csv")
order_items = pd.read_csv(f"{DATA_DIR}/sample_order_items.csv")

# Convert timestamps
interactions['timestamp'] = pd.to_datetime(interactions['timestamp'])

# Get unique products that have interactions
unique_products = interactions['product_id'].unique()

# Apply test mode limits if needed
if TEST_MODE and MAX_PRODUCTS:
    unique_products = unique_products[:MAX_PRODUCTS]
    interactions = interactions[interactions['product_id'].isin(unique_products)]

if TEST_MODE and MAX_CUSTOMERS:
    test_customers = customers['customer_id'].head(MAX_CUSTOMERS).values
    interactions = interactions[interactions['customer_id'].isin(test_customers)]
    customers = customers[customers['customer_id'].isin(test_customers)]

print(f"\n‚úì Loaded data:")
print(f"  Customers: {len(customers):,}")
print(f"  Interactions: {len(interactions):,}")
print(f"  Unique Products: {len(unique_products):,}")
print(f"  Interaction Types: {interactions['interaction_type'].value_counts().to_dict()}")

# Display sample
print(f"\nüìù Sample interactions:")
interactions.head()

In [None]:
# ============================================================
# IMAGE EMBEDDING MODEL
# ============================================================

class ImageEmbeddingExtractor(nn.Module):
    """
    Extract 1280-dimensional embeddings from product images using MobileNetV2.
    This is the same architecture used in the collection prediction model.
    """
    
    def __init__(self, pretrained=True):
        super().__init__()
        # Load pretrained MobileNetV2
        mobilenet = models.mobilenet_v2(weights=models.MobileNet_V2_Weights.IMAGENET1K_V1 if pretrained else None)
        
        # Use feature extraction layers (before classifier)
        self.features = mobilenet.features
        self.pool = nn.AdaptiveAvgPool2d(1)
        
    def forward(self, x):
        x = self.features(x)
        x = self.pool(x)
        x = x.view(x.size(0), -1)  # Flatten to (batch, 1280)
        return x

# Initialize model
print("üé® Initializing embedding model...")
embedding_model = ImageEmbeddingExtractor(pretrained=True).to(device)
embedding_model.eval()

# Count parameters
total_params = sum(p.numel() for p in embedding_model.parameters())
print(f"‚úì Model initialized")
print(f"  Architecture: MobileNetV2")
print(f"  Output dimension: {EMBEDDING_DIM}")
print(f"  Total parameters: {total_params:,}")
print(f"  Device: {device}")

In [None]:
# ============================================================
# EXTRACT PRODUCT EMBEDDINGS
# ============================================================

# Image preprocessing pipeline
transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

def extract_embedding(image_path):
    """Extract 1280-dim embedding from a single image."""
    try:
        # Load and preprocess image
        img = Image.open(image_path).convert('RGB')
        img_tensor = transform(img).unsqueeze(0).to(device)
        
        # Extract embedding
        with torch.no_grad():
            embedding = embedding_model(img_tensor)
        
        return embedding.cpu().numpy().flatten()
    
    except Exception as e:
        print(f"‚ö†Ô∏è  Error processing {image_path}: {e}")
        return None

# Extract embeddings for all products
print(f"\nüé® Extracting embeddings for {len(unique_products)} products...")
print(f"   Image directory: {IMAGE_DIR}")

product_embeddings = {}
image_dir = Path(IMAGE_DIR)
successful = 0
failed = 0

for product_id in tqdm(unique_products, desc='Extracting embeddings'):
    image_path = image_dir / f"{product_id}.jpg"
    
    if image_path.exists():
        embedding = extract_embedding(str(image_path))
        if embedding is not None:
            product_embeddings[str(product_id)] = embedding
            successful += 1
        else:
            failed += 1
    else:
        failed += 1

print(f"\n‚úì Embedding extraction complete:")
print(f"  Successful: {successful:,} products")
print(f"  Failed/Missing: {failed:,} products")
print(f"  Success rate: {(successful/(successful+failed)*100):.1f}%")
print(f"  Embedding shape: {list(product_embeddings.values())[0].shape if product_embeddings else 'N/A'}")

# Save embeddings to disk for future use
embeddings_file = Path(MODEL_DIR) / "product_embeddings.npz"
np.savez_compressed(
    embeddings_file,
    product_ids=list(product_embeddings.keys()),
    embeddings=np.array(list(product_embeddings.values()))
)
print(f"\nüíæ Saved embeddings to: {embeddings_file}")

In [None]:
# ============================================================
# BUILD CUSTOMER STYLE PROFILES
# ============================================================

print("üë§ Building customer style profiles...")

# Initialize the profiler
profiler = CustomerStyleProfiler(
    product_embeddings=product_embeddings,
    recency_half_life_days=RECENCY_HALF_LIFE_DAYS
)

# Update interaction weights
profiler.INTERACTION_WEIGHTS = INTERACTION_WEIGHTS

# Build profiles for all customers with sufficient interactions
customer_profiles = profiler.build_all_profiles(
    interactions_df=interactions,
    min_interactions=MIN_INTERACTIONS
)

print(f"\n‚úì Customer profile building complete:")
print(f"  Total customers in dataset: {len(customers)}")
print(f"  Customers with profiles: {len(customer_profiles)} ({len(customer_profiles)/len(customers)*100:.1f}%)")
print(f"  Customers without profiles: {len(customers) - len(customer_profiles)} (insufficient interactions)")
print(f"  Profile dimension: {list(customer_profiles.values())[0].shape[0] if customer_profiles else 'N/A'}")

# Analyze interaction coverage
customers_with_profiles = list(customer_profiles.keys())
profile_interactions = interactions[interactions['customer_id'].isin(customers_with_profiles)]

print(f"\nüìä Profile Statistics:")
print(f"  Interactions used: {len(profile_interactions):,} / {len(interactions):,} ({len(profile_interactions)/len(interactions)*100:.1f}%)")
print(f"  Avg interactions per profiled customer: {len(profile_interactions)/len(customer_profiles):.1f}")

# Interaction breakdown for profiled customers
interaction_breakdown = profile_interactions['interaction_type'].value_counts()
print(f"\n  Interaction type breakdown (profiled customers):")
for itype, count in interaction_breakdown.items():
    weight = INTERACTION_WEIGHTS.get(itype, 1.0)
    print(f"    {itype}: {count:,} (weight: {weight}x)")

# Save profiles to disk
profiles_file = Path(MODEL_DIR) / "customer_style_profiles.npz"
np.savez_compressed(
    profiles_file,
    customer_ids=list(customer_profiles.keys()),
    profiles=np.array(list(customer_profiles.values()))
)
print(f"\nüíæ Saved profiles to: {profiles_file}")

In [None]:
# ============================================================
# GENERATE RECOMMENDATIONS FOR ALL CUSTOMERS
# ============================================================

print("üéØ Generating recommendations for all customers...")

all_recommendations = {}

for customer_id in tqdm(customers_with_profiles, desc="Generating recommendations"):
    # Get customer's interaction history to exclude already-purchased items
    customer_interactions = interactions[interactions['customer_id'] == customer_id]
    purchased_items = set(customer_interactions[customer_interactions['interaction_type'] == 'purchase']['product_id'].unique())
    
    # Generate recommendations
    recommendations = profiler.recommend_by_style(
        customer_id=customer_id,
        customer_profiles=customer_profiles,
        top_k=TOP_K,
        exclude_products=purchased_items
    )
    
    all_recommendations[customer_id] = recommendations

print(f"\n‚úì Recommendation generation complete:")
print(f"  Total customers: {len(all_recommendations):,}")
print(f"  Recommendations per customer: {TOP_K}")
print(f"  Total recommendations: {len(all_recommendations) * TOP_K:,}")

# Analyze recommendation diversity
recommended_products = [prod_id for recs in all_recommendations.values() for prod_id, _ in recs]
unique_recommended = len(set(recommended_products))
total_products = len(product_embeddings)

print(f"\nüìä Recommendation Coverage:")
print(f"  Unique products recommended: {unique_recommended:,} / {total_products:,} ({unique_recommended/total_products*100:.1f}%)")
print(f"  Avg recommendations per unique product: {len(recommended_products)/unique_recommended:.1f}x")

# Most frequently recommended products
from collections import Counter
top_recommended = Counter(recommended_products).most_common(10)
print(f"\nüî• Top 10 most frequently recommended products:")
for rank, (prod_id, count) in enumerate(top_recommended, 1):
    prod_info = products[products['artikel_id'] == prod_id].iloc[0] if prod_id in products['artikel_id'].values else None
    if prod_info is not None:
        name = prod_info['produkt_de'][:40] if 'produkt_de' in prod_info else 'Unknown'
        print(f"  {rank:2d}. {prod_id} - {name} ({count} customers)")
    else:
        print(f"  {rank:2d}. {prod_id} ({count} customers)")

In [None]:
# ============================================================
# EVALUATE RECOMMENDATION PERFORMANCE
# ============================================================

print("üìä Evaluating recommendation performance...")

# For evaluation, we'll use a simple holdout strategy:
# - Use older interactions to build profiles
# - Test if recent purchases are in the top-K recommendations

# Split interactions by date
interactions_sorted = interactions.sort_values('timestamp')
split_date = pd.to_datetime('2024-10-15')  # Use ~mid-point as split

train_interactions = interactions_sorted[interactions_sorted['timestamp'] < split_date]
test_interactions = interactions_sorted[interactions_sorted['timestamp'] >= split_date]
test_purchases = test_interactions[test_interactions['interaction_type'] == 'purchase']

print(f"\nüìÖ Data Split:")
print(f"  Split date: {split_date.strftime('%Y-%m-%d')}")
print(f"  Train interactions: {len(train_interactions):,} ({len(train_interactions)/len(interactions)*100:.1f}%)")
print(f"  Test interactions: {len(test_interactions):,} ({len(test_interactions)/len(interactions)*100:.1f}%)")
print(f"  Test purchases: {len(test_purchases):,}")

# Build profiles using only training data
train_profiles = profiler.build_all_profiles(
    interactions_df=train_interactions,
    min_interactions=MIN_INTERACTIONS
)

print(f"\nüë§ Training profiles: {len(train_profiles)}")

# Calculate metrics for customers with both training profiles and test purchases
evaluable_customers = set(train_profiles.keys()) & set(test_purchases['customer_id'].unique())
print(f"  Evaluable customers: {len(evaluable_customers)} (have profile + test purchases)")

if len(evaluable_customers) > 0:
    # Calculate Precision@K and Recall@K
    precisions = []
    recalls = []
    hits = []
    
    for customer_id in evaluable_customers:
        # Get test purchases for this customer
        customer_test_purchases = test_purchases[test_purchases['customer_id'] == customer_id]['product_id'].unique()
        
        # Get training purchases to exclude
        customer_train_purchases = train_interactions[
            (train_interactions['customer_id'] == customer_id) & 
            (train_interactions['interaction_type'] == 'purchase')
        ]['product_id'].unique()
        
        # Generate recommendations
        recs = profiler.recommend_by_style(
            customer_id=customer_id,
            customer_profiles=train_profiles,
            top_k=TOP_K,
            exclude_products=set(customer_train_purchases)
        )
        
        recommended_ids = [prod_id for prod_id, _ in recs]
        
        # Calculate precision and recall
        hits_count = len(set(recommended_ids) & set(customer_test_purchases))
        precision = hits_count / TOP_K if TOP_K > 0 else 0
        recall = hits_count / len(customer_test_purchases) if len(customer_test_purchases) > 0 else 0
        
        precisions.append(precision)
        recalls.append(recall)
        hits.append(hits_count)
    
    # Calculate average metrics
    avg_precision = np.mean(precisions)
    avg_recall = np.mean(recalls)
    avg_hits = np.mean(hits)
    hit_rate = sum(1 for h in hits if h > 0) / len(hits)
    
    print(f"\nüéØ Performance Metrics (on test set):")
    print(f"  Precision@{TOP_K}: {avg_precision*100:.2f}%")
    print(f"  Recall@{TOP_K}: {avg_recall*100:.2f}%")
    print(f"  Hit Rate@{TOP_K}: {hit_rate*100:.2f}% ({sum(1 for h in hits if h > 0)}/{len(hits)} customers)")
    print(f"  Avg hits per customer: {avg_hits:.2f}")
    
    # Show distribution of hits
    hit_distribution = Counter(hits)
    print(f"\n  Hit distribution:")
    for hit_count in sorted(hit_distribution.keys()):
        count = hit_distribution[hit_count]
        print(f"    {hit_count} hits: {count} customers ({count/len(hits)*100:.1f}%)")
else:
    print("\n‚ö†Ô∏è  Not enough data for meaningful evaluation")
    print("  (Need customers with both training interactions and test purchases)")
    
# Recommendation diversity metrics (using full profiles)
print(f"\nüåà Diversity Metrics (full dataset):")
print(f"  Catalog coverage: {unique_recommended}/{total_products} products ({unique_recommended/total_products*100:.1f}%)")
print(f"  Gini coefficient: {1 - (2 * sum((i+1) * count for i, (_, count) in enumerate(sorted(Counter(recommended_products).items(), key=lambda x: x[1])))) / (len(recommended_products) * len(set(recommended_products))):.3f}")
print(f"    (0 = perfect equality, 1 = perfect inequality)")

In [None]:
# ============================================================
# INFERENCE EXAMPLES - SHOW RECOMMENDATIONS FOR SAMPLE CUSTOMERS
# ============================================================

print("üîç Showing detailed recommendations for sample customers...\n")

# Select 3 customers with different profiles
sample_customers = list(customers_with_profiles)[:3]

for idx, customer_id in enumerate(sample_customers, 1):
    print("="*80)
    print(f"CUSTOMER #{idx}: {customer_id}")
    print("="*80)
    
    # Get customer info
    customer_info = customers[customers['customer_id'] == customer_id].iloc[0]
    print(f"\nüë§ Customer Profile:")
    print(f"  Age: {customer_info['age']}")
    print(f"  Gender: {customer_info['gender']}")
    print(f"  Location: {customer_info['location']}")
    print(f"  Segment: {customer_info['customer_segment']}")
    print(f"  Lifetime Value: ‚Ç¨{customer_info['lifetime_value']:,.2f}")
    print(f"  Total Orders: {customer_info['total_orders']}")
    
    # Get interaction history
    customer_interactions = interactions[interactions['customer_id'] == customer_id]
    print(f"\nüìä Interaction History:")
    print(f"  Total interactions: {len(customer_interactions)}")
    for itype, count in customer_interactions['interaction_type'].value_counts().items():
        print(f"    {itype}: {count}")
    
    # Show recent purchases
    recent_purchases = customer_interactions[
        customer_interactions['interaction_type'] == 'purchase'
    ].sort_values('timestamp', ascending=False).head(3)
    
    if len(recent_purchases) > 0:
        print(f"\nüõçÔ∏è  Recent Purchases:")
        for _, purchase in recent_purchases.iterrows():
            prod_id = purchase['product_id']
            timestamp = purchase['timestamp']
            prod_info = products[products['artikel_id'] == prod_id].iloc[0] if prod_id in products['artikel_id'].values else None
            if prod_info is not None:
                name = prod_info['produkt_de'][:50] if 'produkt_de' in prod_info else 'Unknown'
                collection = prod_info['kollektion_de'] if 'kollektion_de' in prod_info else 'Unknown'
                print(f"    ‚Ä¢ {prod_id} - {name}")
                print(f"      Collection: {collection} | Date: {timestamp.strftime('%Y-%m-%d')}")
    
    # Get recommendations
    purchased_items = set(customer_interactions[customer_interactions['interaction_type'] == 'purchase']['product_id'].unique())
    recommendations = all_recommendations.get(customer_id, [])
    
    print(f"\nüéØ Top {len(recommendations)} Recommendations:")
    for rank, (prod_id, similarity) in enumerate(recommendations, 1):
        prod_info = products[products['artikel_id'] == prod_id].iloc[0] if prod_id in products['artikel_id'].values else None
        if prod_info is not None:
            name = prod_info['produkt_de'][:50] if 'produkt_de' in prod_info else 'Unknown'
            collection = prod_info['kollektion_de'] if 'kollektion_de' in prod_info else 'Unknown'
            price = prod_info['preis'] if 'preis' in prod_info else 'N/A'
            print(f"  {rank:2d}. [{similarity:.4f}] {prod_id}")
            print(f"      {name}")
            print(f"      Collection: {collection} | Price: ‚Ç¨{price}")
        else:
            print(f"  {rank:2d}. [{similarity:.4f}] {prod_id}")
    
    print("\n")

print("="*80)
print("‚úì Inference examples complete")
print("="*80)

In [None]:
# ============================================================
# VISUALIZE RECOMMENDATION QUALITY
# ============================================================

import matplotlib.pyplot as plt
import seaborn as sns

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Recommendation System Analysis', fontsize=16, fontweight='bold')

# 1. Profile Coverage
ax1 = axes[0, 0]
profile_coverage = pd.DataFrame({
    'Category': ['With Profile', 'Without Profile'],
    'Count': [len(customer_profiles), len(customers) - len(customer_profiles)]
})
sns.barplot(data=profile_coverage, x='Category', y='Count', ax=ax1, palette='viridis')
ax1.set_title('Customer Profile Coverage', fontweight='bold')
ax1.set_ylabel('Number of Customers')
for i, v in enumerate(profile_coverage['Count']):
    ax1.text(i, v + 0.5, str(v), ha='center', fontweight='bold')

# 2. Recommendation Diversity
ax2 = axes[0, 1]
product_freq = Counter(recommended_products)
freq_distribution = Counter(product_freq.values())
sorted_freqs = sorted(freq_distribution.items())
ax2.bar([f for f, _ in sorted_freqs], [c for _, c in sorted_freqs], color='coral')
ax2.set_title('Product Recommendation Frequency', fontweight='bold')
ax2.set_xlabel('Times Recommended')
ax2.set_ylabel('Number of Products')
ax2.grid(axis='y', alpha=0.3)

# 3. Interaction Type Distribution
ax3 = axes[1, 0]
interaction_counts = profile_interactions['interaction_type'].value_counts()
colors = plt.cm.Set3(range(len(interaction_counts)))
wedges, texts, autotexts = ax3.pie(
    interaction_counts.values,
    labels=interaction_counts.index,
    autopct='%1.1f%%',
    colors=colors,
    startangle=90
)
ax3.set_title('Interaction Type Distribution\n(Profiled Customers)', fontweight='bold')
for autotext in autotexts:
    autotext.set_color('white')
    autotext.set_fontweight('bold')

# 4. Similarity Score Distribution
ax4 = axes[1, 1]
all_similarities = [sim for recs in all_recommendations.values() for _, sim in recs]
ax4.hist(all_similarities, bins=30, color='skyblue', edgecolor='black', alpha=0.7)
ax4.axvline(np.mean(all_similarities), color='red', linestyle='--', linewidth=2, label=f'Mean: {np.mean(all_similarities):.4f}')
ax4.set_title('Recommendation Similarity Scores', fontweight='bold')
ax4.set_xlabel('Cosine Similarity')
ax4.set_ylabel('Frequency')
ax4.legend()
ax4.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nüìä Visualization Summary:")
print(f"  Profile coverage: {len(customer_profiles)}/{len(customers)} ({len(customer_profiles)/len(customers)*100:.1f}%)")
print(f"  Unique products recommended: {unique_recommended}/{total_products} ({unique_recommended/total_products*100:.1f}%)")
print(f"  Mean similarity score: {np.mean(all_similarities):.4f}")
print(f"  Median similarity score: {np.median(all_similarities):.4f}")
print(f"  Min similarity score: {np.min(all_similarities):.4f}")
print(f"  Max similarity score: {np.max(all_similarities):.4f}")

In [None]:
# ============================================================
# SAVE MODELS AND ARTIFACTS FOR DEPLOYMENT
# ============================================================

print("üíæ Saving models and artifacts...")

# 1. Save product embeddings (already saved during extraction)
print(f"‚úì Product embeddings: {embeddings_file}")

# 2. Save customer profiles (already saved)
print(f"‚úì Customer profiles: {profiles_file}")

# 3. Save recommendations to JSON for easy deployment
import json

recommendations_file = Path(MODEL_DIR) / "recommendations.json"
recommendations_json = {
    customer_id: [
        {"product_id": prod_id, "similarity": float(sim)}
        for prod_id, sim in recs
    ]
    for customer_id, recs in all_recommendations.items()
}

with open(recommendations_file, 'w') as f:
    json.dump(recommendations_json, f, indent=2)
print(f"‚úì Recommendations: {recommendations_file}")

# 4. Save metadata
metadata = {
    "model_type": "MobileNetV2",
    "embedding_dim": EMBEDDING_DIM,
    "image_size": IMG_SIZE,
    "test_mode": TEST_MODE,
    "min_interactions": MIN_INTERACTIONS,
    "top_k": TOP_K,
    "recency_half_life_days": RECENCY_HALF_LIFE_DAYS,
    "interaction_weights": INTERACTION_WEIGHTS,
    "total_customers": len(customers),
    "customers_with_profiles": len(customer_profiles),
    "total_products": len(products),
    "products_with_embeddings": len(product_embeddings),
    "unique_products_recommended": unique_recommended,
    "catalog_coverage_pct": round(unique_recommended/total_products*100, 2),
    "timestamp": pd.Timestamp.now().isoformat()
}

metadata_file = Path(MODEL_DIR) / "recommendation_metadata.json"
with open(metadata_file, 'w') as f:
    json.dump(metadata, f, indent=2)
print(f"‚úì Metadata: {metadata_file}")

# 5. Save profiler configuration for easy reload
profiler_config = {
    "recency_half_life_days": RECENCY_HALF_LIFE_DAYS,
    "interaction_weights": INTERACTION_WEIGHTS,
    "min_interactions": MIN_INTERACTIONS
}

profiler_config_file = Path(MODEL_DIR) / "profiler_config.json"
with open(profiler_config_file, 'w') as f:
    json.dump(profiler_config, f, indent=2)
print(f"‚úì Profiler config: {profiler_config_file}")

print(f"\nüì¶ All artifacts saved to: {MODEL_DIR}")
print(f"\nTo load and use the recommendation system:")
print(f"""
# Load embeddings
data = np.load('{embeddings_file}')
product_embeddings = dict(zip(data['product_ids'], data['embeddings']))

# Load profiles
profile_data = np.load('{profiles_file}')
customer_profiles = dict(zip(profile_data['customer_ids'], profile_data['profiles']))

# Load config
with open('{profiler_config_file}', 'r') as f:
    config = json.load(f)

# Initialize profiler
profiler = CustomerStyleProfiler(
    product_embeddings=product_embeddings,
    recency_half_life_days=config['recency_half_life_days']
)
profiler.INTERACTION_WEIGHTS = config['interaction_weights']

# Generate recommendations
recommendations = profiler.recommend_by_style(
    customer_id='C001',
    customer_profiles=customer_profiles,
    top_k=10
)
""")

In [None]:
# ============================================================
# SUMMARY & NEXT STEPS
# ============================================================

print("üéâ Style-Based Recommendation System - Complete!")
print("="*80)

print("\nüìä FINAL RESULTS:")
print(f"  ‚Ä¢ Customers analyzed: {len(customers):,}")
print(f"  ‚Ä¢ Customers with profiles: {len(customer_profiles):,} ({len(customer_profiles)/len(customers)*100:.1f}%)")
print(f"  ‚Ä¢ Products in catalog: {len(products):,}")
print(f"  ‚Ä¢ Products with embeddings: {len(product_embeddings):,}")
print(f"  ‚Ä¢ Unique products recommended: {unique_recommended:,} ({unique_recommended/total_products*100:.1f}% coverage)")
print(f"  ‚Ä¢ Recommendations per customer: {TOP_K}")
print(f"  ‚Ä¢ Mean similarity score: {np.mean(all_similarities):.4f}")

if len(evaluable_customers) > 0:
    print(f"\nüéØ EVALUATION METRICS (Test Set):")
    print(f"  ‚Ä¢ Precision@{TOP_K}: {avg_precision*100:.2f}%")
    print(f"  ‚Ä¢ Recall@{TOP_K}: {avg_recall*100:.2f}%")
    print(f"  ‚Ä¢ Hit Rate@{TOP_K}: {hit_rate*100:.2f}%")

print(f"\nüíæ SAVED ARTIFACTS:")
print(f"  ‚Ä¢ {embeddings_file}")
print(f"  ‚Ä¢ {profiles_file}")
print(f"  ‚Ä¢ {recommendations_file}")
print(f"  ‚Ä¢ {metadata_file}")
print(f"  ‚Ä¢ {profiler_config_file}")

print("\nüöÄ NEXT STEPS:")
print("""
1. Production Deployment:
   - Integrate with e-commerce platform API
   - Set up batch processing for nightly profile updates
   - Implement real-time recommendation serving (< 50ms latency)
   - Add caching layer (Redis) for frequently accessed recommendations

2. Model Improvements:
   - A/B test different interaction weights
   - Experiment with recency decay parameters
   - Try ensemble with collaborative filtering
   - Add demographic filters (age, gender, location)
   - Incorporate seasonal trends (holidays, seasons)

3. Monitoring & Analytics:
   - Track click-through rate (CTR) on recommendations
   - Measure conversion rate and revenue impact
   - Monitor recommendation diversity over time
   - Set up alerts for coverage drops or quality degradation

4. Advanced Features:
   - "Shop the Look" - recommend complete outfits
   - "Similar Customers" - show what similar shoppers bought
   - "Trending in Your Style" - personalized trending items
   - Email campaigns with personalized recommendations
   - "Customers also bought" - order-based recommendations

5. Data Collection:
   - Gather more customer interaction data
   - Collect explicit feedback (ratings, likes)
   - Track recommendation performance metrics
   - Build feedback loop for continuous improvement
""")

print("="*80)
print("‚úì Notebook execution complete!")

## 1Ô∏è‚É£2Ô∏è‚É£ Summary & Next Steps

## 1Ô∏è‚É£1Ô∏è‚É£ Save Models & Artifacts

## üîü Visualize Recommendations

## 9Ô∏è‚É£ Inference Examples

## 8Ô∏è‚É£ Evaluate Performance

## 7Ô∏è‚É£ Generate Recommendations

## 6Ô∏è‚É£ Build Customer Style Profiles

## 5Ô∏è‚É£ Extract Product Embeddings

## 4Ô∏è‚É£ Build Image Embedding Model

## 3Ô∏è‚É£ Load Data

## 2Ô∏è‚É£ Import Libraries

## 1Ô∏è‚É£ Configuration