# üéØ ƒê√ÅNH GI√Å H·ªÜ TH·ªêNG G·ª¢I √ù S·∫¢N PH·∫®M
# Recommendation Engine Evaluation

## T·ªïng Quan / Overview
Notebook n√†y ƒë√°nh gi√° to√†n di·ªán **H·ªá th·ªëng G·ª£i √Ω S·∫£n ph·∫©m** cho JanSport E-commerce Store:
This notebook provides comprehensive evaluation of the **Recommendation Engine** for JanSport E-commerce:

### üìã N·ªôi dung ƒë√°nh gi√° / Evaluation Coverage:
- **‚ö° Hi·ªáu nƒÉng API / API Performance**: Response time, throughput, cache hit rate c·ªßa 5 chi·∫øn l∆∞·ª£c
- **üéØ ƒê·ªô ch√≠nh x√°c / Accuracy**: Precision, Recall, F1-score cho hybrid algorithm
- **üìä Ch·∫•t l∆∞·ª£ng recommendation**: Click-through rate, Conversion rate tr√™n s·∫£n ph·∫©m JanSport
- **‚ùÑÔ∏è X·ª≠ l√Ω Cold Start**: New user/product scenarios v·ªõi 100+ s·∫£n ph·∫©m JanSport
- **üîÑ Stability & Load**: Load testing, error handling v·ªõi PostgreSQL + Redis

### üéØ M·ª•c ti√™u hi·ªáu nƒÉng / Performance Targets:
- **Response time**: < 500ms (P95) cho cached, < 2s cho computed
- **Cache hit rate**: > 80% v·ªõi Redis TTL 1 gi·ªù  
- **Click-through rate**: > 2% (business metric)
- **Cold start coverage**: > 90% cho user m·ªõi

### üèóÔ∏è Ki·∫øn tr√∫c h·ªá th·ªëng / System Architecture:
- **FastAPI Service** (Port 8001): Python 3.11, async endpoints
- **PostgreSQL Database**: 7 b·∫£ng recommendation (interactions, preferences, similarities, etc.)
- **Redis Cache**: High-speed caching v·ªõi TTL optimization
- **ML Hybrid Engine**: 40% Content-based + 60% Collaborative Filtering
- **5 Recommendation Strategies**: Hybrid, Content, Collaborative, Trending, Frequently Bought Together

---
**üìÖ Ng√†y / Date:** December 18, 2025  
**üè∑Ô∏è Phi√™n b·∫£n / Version:** 1.2  
**üéØ D·ªãch v·ª• / Service:** Recommendation Engine (Port 8001)  
**üõçÔ∏è C·ª≠a h√†ng / Store:** JanSport Backpacks & Accessories

## 1. Environment Setup and Dependencies

In [None]:
# Import required libraries for JanSport Recommendation Engine evaluation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import json
import time
import logging
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# For database connection (PostgreSQL shared with Medusa)
import psycopg2
from sqlalchemy import create_engine
import redis

# For metrics calculation
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics.pairwise import cosine_similarity
from typing import Dict, List, Any, Optional, Tuple

# Configure visualization
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 50)

# JanSport E-commerce Recommendation Service Configuration
RECOMMENDATION_SERVICE_URL = "http://localhost:8001"  # FastAPI Recommendation Service
MEDUSA_SERVICE_URL = "http://localhost:9000"          # Medusa v2 Backend for product data

# Shared PostgreSQL Database Configuration (same as Medusa)
DB_CONFIG = {
    'host': 'localhost',
    'port': 5432,
    'database': 'medusa-store',  # Shared database v·ªõi Medusa backend
    'user': 'postgres',
    'password': 'supersecretpassword'
}

# Redis Configuration (for caching)
REDIS_CONFIG = {
    'host': 'localhost',
    'port': 6379,
    'db': 0,  # Database 0 for recommendations
    'decode_responses': True
}

print("üéØ ƒê√ÅNH GI√Å H·ªÜ TH·ªêNG G·ª¢I √ù S·∫¢N PH·∫®M - JANSPORT E-COMMERCE")
print("üéØ Recommendation Engine Evaluation - JanSport E-commerce")
print("=" * 70)
print(f"üìÖ Ng√†y ƒë√°nh gi√° / Evaluation Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üéØ D·ªãch v·ª• ƒë√≠ch / Target Service: {RECOMMENDATION_SERVICE_URL} (FastAPI)")
print(f"üõçÔ∏è Backend s·∫£n ph·∫©m / Product Backend: {MEDUSA_SERVICE_URL} (Medusa v2)")
print(f"üóÑÔ∏è C∆° s·ªü d·ªØ li·ªáu / Database: {DB_CONFIG['host']}:{DB_CONFIG['port']} ({DB_CONFIG['database']})")
print(f"üîÑ Cache Redis: {REDIS_CONFIG['host']}:{REDIS_CONFIG['port']} (DB {REDIS_CONFIG['db']})")
print(f"üíº C·ª≠a h√†ng / Store: JanSport Backpacks & Accessories (100+ products)")
print(f"üéØ ML Algorithm: Hybrid (40% Content + 60% Collaborative Filtering)")
print(f"üìä Strategies: 5 recommendation strategies (Hybrid, Content, Collaborative, Trending, Together)")

## 2. Database Connection and Schema Validation

In [None]:
# Connect to database and validate schema
try:
    # PostgreSQL connection
    engine = create_engine(DATABASE_URL)
    conn = engine.connect()
    
    # Redis connection
    redis_client = redis.from_url(REDIS_URL)
    redis_client.ping()
    
    print("‚úÖ Database connections established successfully!")
    
    # Check recommendation tables
    rec_tables_query = """
    SELECT table_name 
    FROM information_schema.tables 
    WHERE table_schema = 'public' 
    AND table_name LIKE 'rec_%'
    ORDER BY table_name;
    """
    
    rec_tables = pd.read_sql(rec_tables_query, conn)
    print(f"\nüìä Found {len(rec_tables)} recommendation tables:")
    for table in rec_tables['table_name']:
        print(f"  - {table}")
    
    # Validate table structures
    for table in rec_tables['table_name']:
        count_query = f"SELECT COUNT(*) as count FROM {table};"
        count_result = pd.read_sql(count_query, conn)
        print(f"  üìà {table}: {count_result['count'].iloc[0]} records")
    
    # Check service health
    health_response = requests.get(f"{RECOMMENDATION_SERVICE_URL}/health", timeout=5)
    if health_response.status_code == 200:
        health_data = health_response.json()
        print(f"\n‚úÖ Recommendation service is healthy!")
        print(f"üìä Service status: {health_data}")
    else:
        print(f"‚ùå Service health check failed: {health_response.status_code}")
        
except Exception as e:
    print(f"‚ùå Connection failed: {str(e)}")
    raise

## 3. Load Test Data for Recommendations

In [None]:
# Load existing data and generate test scenarios for JanSport products
print("üìä Loading existing recommendation data for JanSport E-commerce...")

try:
    # Load user interactions
    interactions_query = """
    SELECT 
        user_id, 
        session_id, 
        product_id,
        product_handle,
        interaction_type,
        weight,
        created_at,
        metadata
    FROM rec_user_interactions
    ORDER BY created_at DESC
    LIMIT 1000;
    """

    interactions_df = pd.read_sql(interactions_query, conn)
    print(f"‚úÖ Loaded {len(interactions_df)} user interactions")
    
    if len(interactions_df) > 0:
        print(f"üìà Interaction types: {interactions_df['interaction_type'].value_counts().to_dict()}")

    # Load user preferences  
    preferences_query = """
    SELECT 
        user_id,
        category,
        score,
        interaction_count,
        last_updated
    FROM rec_user_preferences
    ORDER BY score DESC
    LIMIT 100;
    """

    preferences_df = pd.read_sql(preferences_query, conn)
    print(f"‚úÖ Loaded {len(preferences_df)} user preferences")
    
    if len(preferences_df) > 0:
        print(f"üéØ Top categories: {preferences_df['category'].value_counts().head().to_dict()}")

    # Load JanSport products for testing
    products_query = """
    SELECT 
        p.id,
        p.handle,
        p.title,
        p.collection_id,
        p.thumbnail,
        p.created_at,
        pv.title as variant_title,
        pv.id as variant_id,
        ma.amount as price
    FROM product p
    LEFT JOIN product_variant pv ON p.id = pv.product_id
    LEFT JOIN money_amount ma ON pv.id = ma.variant_id
    WHERE p.title ILIKE '%JanSport%' OR p.handle ILIKE '%jansport%'
    ORDER BY p.created_at DESC
    LIMIT 50;
    """

    products_df = pd.read_sql(products_query, conn)
    print(f"‚úÖ Loaded {len(products_df)} JanSport products for testing")
    
    if len(products_df) > 0:
        print(f"üéí Sample products:")
        for _, product in products_df.head(3).iterrows():
            price = f"${product['price']/100:.2f}" if product['price'] else "N/A"
            print(f"   ‚Ä¢ {product['title']} ({product['handle']}) - {price}")

    # Generate test user profiles for different scenarios
    test_users = [
        {
            'user_id': 'test_user_new',
            'profile': 'new_user',
            'description': 'Ng∆∞·ªùi d√πng m·ªõi, ch∆∞a c√≥ l·ªãch s·ª≠ t∆∞∆°ng t√°c',
            'expected_strategy': 'trending'
        },
        {
            'user_id': 'test_user_student', 
            'profile': 'student',
            'description': 'H·ªçc sinh th√≠ch balo school backpack',
            'expected_strategy': 'content_based',
            'preferences': ['backpack', 'school']
        },
        {
            'user_id': 'test_user_traveler',
            'profile': 'traveler', 
            'description': 'Du kh√°ch th√≠ch balo l·ªõn, laptop bag',
            'expected_strategy': 'hybrid',
            'preferences': ['travel', 'laptop', 'large']
        },
        {
            'user_id': 'test_user_frequent',
            'profile': 'frequent_buyer',
            'description': 'Kh√°ch h√†ng th∆∞·ªùng xuy√™n, ƒë√£ mua nhi·ªÅu l·∫ßn',
            'expected_strategy': 'collaborative',
            'interaction_count': 50
        },
        {
            'user_id': 'test_user_premium',
            'profile': 'premium_customer',
            'description': 'Kh√°ch h√†ng VIP, th√≠ch s·∫£n ph·∫©m cao c·∫•p',
            'expected_strategy': 'hybrid',
            'price_range': 'high'
        }
    ]

    test_users_df = pd.DataFrame(test_users)
    
    # Generate test product scenarios
    if len(products_df) > 0:
        test_products = products_df.head(10).copy()
        test_products['test_scenario'] = [
            'popular_item', 'new_item', 'sale_item', 'trending_item', 'seasonal_item',
            'premium_item', 'basic_item', 'limited_item', 'gift_item', 'bestseller_item'
        ][:len(test_products)]
    else:
        # Fallback test products if no real products found
        test_products = pd.DataFrame([
            {'id': 'prod_test_1', 'handle': 'jansport-superbreak', 'title': 'JanSport Superbreak', 'test_scenario': 'popular_item'},
            {'id': 'prod_test_2', 'handle': 'jansport-right-pack', 'title': 'JanSport Right Pack', 'test_scenario': 'new_item'},
            {'id': 'prod_test_3', 'handle': 'jansport-big-student', 'title': 'JanSport Big Student', 'test_scenario': 'trending_item'},
        ])

    print(f"\nüß™ Test Configuration:")
    print(f"   ‚Ä¢ Test users: {len(test_users_df)} profiles")
    print(f"   ‚Ä¢ Test products: {len(test_products)} products")
    print(f"   ‚Ä¢ User profiles: {', '.join(test_users_df['profile'].tolist())}")
    
    # Display test summary
    print(f"\nüìã Test Data Summary:")
    print(f"   ‚Ä¢ Historical interactions: {len(interactions_df)}")
    print(f"   ‚Ä¢ User preferences: {len(preferences_df)}")
    print(f"   ‚Ä¢ Available products: {len(products_df)}")
    print(f"   ‚Ä¢ Test scenarios ready: ‚úÖ")

except Exception as e:
    print(f"‚ùå Error loading test data: {str(e)}")
    # Create minimal test data for demo
    test_users_df = pd.DataFrame([
        {'user_id': 'demo_user', 'profile': 'demo', 'description': 'Demo user for testing'}
    ])
    test_products = pd.DataFrame([
        {'id': 'demo_prod', 'handle': 'demo-product', 'title': 'Demo Product', 'test_scenario': 'demo_item'}
    ])
    interactions_df = pd.DataFrame()
    preferences_df = pd.DataFrame()
    products_df = pd.DataFrame()

## 4. Recommendation Engine Performance Testing

In [None]:
# Test different recommendation strategies
strategies = ['hybrid', 'content', 'collaborative', 'trending', 'frequently_bought_together']
performance_results = []

print("üß™ Testing recommendation strategies...")

for strategy in strategies:
    print(f"\nüìä Testing strategy: {strategy}")
    strategy_results = []
    
    # Test with different user types
    for user_type, users in test_scenarios.items():
        if user_type == 'session_ids':
            continue
            
        for user_id in users[:5]:  # Test 5 users per type
            try:
                start_time = time.time()
                
                # Make recommendation request
                response = requests.get(
                    f"{RECOMMENDATION_SERVICE_URL}/recommendations",
                    params={
                        'userId': user_id,
                        'limit': 10,
                        'algorithm': strategy
                    },
                    timeout=10
                )
                
                end_time = time.time()
                response_time = (end_time - start_time) * 1000  # Convert to ms
                
                if response.status_code == 200:
                    data = response.json()
                    recommendations = data.get('recommendations', [])
                    
                    strategy_results.append({
                        'strategy': strategy,
                        'user_type': user_type,
                        'user_id': user_id,
                        'response_time_ms': response_time,
                        'recommendation_count': len(recommendations),
                        'status': 'success',
                        'personalized': data.get('personalized', False),
                        'cached': data.get('cached', False)
                    })
                    
                else:
                    strategy_results.append({
                        'strategy': strategy,
                        'user_type': user_type,
                        'user_id': user_id,
                        'response_time_ms': response_time,
                        'recommendation_count': 0,
                        'status': 'error',
                        'personalized': False,
                        'cached': False
                    })
                    
                # Small delay between requests
                time.sleep(0.1)
                
            except Exception as e:
                logger.error(f"Error testing {strategy} for {user_id}: {str(e)}")
                strategy_results.append({
                    'strategy': strategy,
                    'user_type': user_type,
                    'user_id': user_id,
                    'response_time_ms': 0,
                    'recommendation_count': 0,
                    'status': 'timeout',
                    'personalized': False,
                    'cached': False
                })
    
    performance_results.extend(strategy_results)
    print(f"‚úÖ Completed {len(strategy_results)} tests for {strategy}")

# Convert to DataFrame for analysis
performance_df = pd.DataFrame(performance_results)
print(f"\nüìä Performance testing completed!")
print(f"   Total tests: {len(performance_df)}")
print(f"   Success rate: {(performance_df['status'] == 'success').sum() / len(performance_df) * 100:.2f}%")

# Display summary statistics
print(f"\nüìà Response Time Statistics (ms):")
success_df = performance_df[performance_df['status'] == 'success']
if len(success_df) > 0:
    print(success_df.groupby('strategy')['response_time_ms'].agg(['mean', 'median', 'std', 'min', 'max']).round(2))

display(performance_df.head(10))

## 5. Interaction Tracking Evaluation

In [None]:
# Test interaction tracking functionality
print("üîç Testing interaction tracking...")

interaction_types = ['view', 'add_to_cart', 'purchase', 'wishlist_add', 'wishlist_remove']
tracking_results = []

# Record initial interaction count
initial_count_query = "SELECT COUNT(*) as count FROM rec_user_interactions;"
initial_count = pd.read_sql(initial_count_query, conn)['count'].iloc[0]
print(f"üìä Initial interactions in database: {initial_count}")

# Test interaction tracking
for i, user_id in enumerate(test_scenarios['random_users'][:10]):
    for j, interaction_type in enumerate(interaction_types):
        try:
            # Select a random product
            product_id = products_df.iloc[np.random.randint(0, len(products_df))]['id']
            product_handle = products_df[products_df['id'] == product_id]['handle'].iloc[0]
            
            start_time = time.time()
            
            # Track interaction
            response = requests.post(
                f"{RECOMMENDATION_SERVICE_URL}/track",
                json={
                    'user_id': user_id,
                    'session_id': f'test_session_{i}',
                    'product_id': product_id,
                    'product_handle': product_handle,
                    'interaction_type': interaction_type,
                    'metadata': {
                        'category': 'backpack',
                        'price': np.random.randint(500000, 2000000),
                        'test_run': True
                    }
                },
                timeout=5
            )
            
            end_time = time.time()
            response_time = (end_time - start_time) * 1000
            
            tracking_results.append({
                'user_id': user_id,
                'product_id': product_id,
                'interaction_type': interaction_type,
                'response_time_ms': response_time,
                'status_code': response.status_code,
                'success': response.status_code == 200
            })
            
            if response.status_code == 200:
                result = response.json()
                logger.info(f"‚úÖ Tracked {interaction_type} for {user_id}: {result}")
            else:
                logger.error(f"‚ùå Failed to track {interaction_type} for {user_id}: {response.status_code}")
                
            time.sleep(0.05)  # Small delay
            
        except Exception as e:
            logger.error(f"Error tracking interaction: {str(e)}")
            tracking_results.append({
                'user_id': user_id,
                'product_id': product_id,
                'interaction_type': interaction_type,
                'response_time_ms': 0,
                'status_code': 0,
                'success': False
            })

# Convert to DataFrame
tracking_df = pd.DataFrame(tracking_results)

# Verify data was stored
final_count_query = "SELECT COUNT(*) as count FROM rec_user_interactions;"
final_count = pd.read_sql(final_count_query, conn)['count'].iloc[0]
new_interactions = final_count - initial_count

print(f"\nüìä Tracking Results:")
print(f"   Total tracking attempts: {len(tracking_df)}")
print(f"   Successful tracks: {tracking_df['success'].sum()}")
print(f"   Success rate: {tracking_df['success'].mean() * 100:.2f}%")
print(f"   New interactions in DB: {new_interactions}")
print(f"   Avg response time: {tracking_df[tracking_df['success']]['response_time_ms'].mean():.2f} ms")

# Display tracking statistics by interaction type
print(f"\nüìà Tracking by interaction type:")
tracking_stats = tracking_df.groupby('interaction_type').agg({
    'success': ['count', 'sum', 'mean'],
    'response_time_ms': 'mean'
}).round(2)
print(tracking_stats)

display(tracking_df.head())

## 6. Response Time Analysis

In [None]:
# Detailed response time analysis and visualization
print("‚è±Ô∏è Performing detailed response time analysis...")

# Filter successful requests for analysis
success_performance = performance_df[performance_df['status'] == 'success'].copy()

# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Response time by strategy
sns.boxplot(data=success_performance, x='strategy', y='response_time_ms', ax=axes[0,0])
axes[0,0].set_title('Response Time Distribution by Strategy')
axes[0,0].set_xlabel('Strategy')
axes[0,0].set_ylabel('Response Time (ms)')
axes[0,0].tick_params(axis='x', rotation=45)

# 2. Response time by user type
sns.boxplot(data=success_performance, x='user_type', y='response_time_ms', ax=axes[0,1])
axes[0,1].set_title('Response Time Distribution by User Type')
axes[0,1].set_xlabel('User Type')
axes[0,1].set_ylabel('Response Time (ms)')
axes[0,1].tick_params(axis='x', rotation=45)

# 3. Cache hit analysis
cache_analysis = success_performance.groupby(['strategy', 'cached']).size().unstack(fill_value=0)
cache_analysis.plot(kind='bar', ax=axes[1,0], stacked=True)
axes[1,0].set_title('Cache Hit/Miss by Strategy')
axes[1,0].set_xlabel('Strategy')
axes[1,0].set_ylabel('Count')
axes[1,0].legend(['Not Cached', 'Cached'])
axes[1,0].tick_params(axis='x', rotation=45)

# 4. Personalization rate
personalization = success_performance.groupby('strategy')['personalized'].mean()
personalization.plot(kind='bar', ax=axes[1,1], color='lightcoral')
axes[1,1].set_title('Personalization Rate by Strategy')
axes[1,1].set_xlabel('Strategy')
axes[1,1].set_ylabel('Personalization Rate')
axes[1,1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Calculate key performance metrics
print(f"\nüìä Key Performance Metrics:")

# Overall statistics
print(f"\nüéØ Overall Performance:")
overall_stats = success_performance['response_time_ms'].describe()
print(f"   Mean response time: {overall_stats['mean']:.2f} ms")
print(f"   Median response time: {overall_stats['50%']:.2f} ms")
print(f"   95th percentile: {success_performance['response_time_ms'].quantile(0.95):.2f} ms")
print(f"   99th percentile: {success_performance['response_time_ms'].quantile(0.99):.2f} ms")

# Cache performance
cache_hit_rate = success_performance['cached'].mean()
print(f"\nüíæ Cache Performance:")
print(f"   Cache hit rate: {cache_hit_rate:.2%}")
print(f"   Avg cached response time: {success_performance[success_performance['cached']]['response_time_ms'].mean():.2f} ms")
print(f"   Avg non-cached response time: {success_performance[~success_performance['cached']]['response_time_ms'].mean():.2f} ms")

# Strategy comparison
print(f"\nüìà Strategy Performance Comparison:")
strategy_stats = success_performance.groupby('strategy').agg({
    'response_time_ms': ['mean', 'median'],
    'cached': 'mean',
    'personalized': 'mean',
    'recommendation_count': 'mean'
}).round(2)
strategy_stats.columns = ['Avg_Time_ms', 'Median_Time_ms', 'Cache_Hit_Rate', 'Personalization_Rate', 'Avg_Recommendations']
print(strategy_stats)

# Performance goals assessment
print(f"\nüéØ Performance Goals Assessment:")
target_p95 = 500  # ms
actual_p95 = success_performance['response_time_ms'].quantile(0.95)
target_cache_hit = 0.80
actual_cache_hit = cache_hit_rate

print(f"   P95 Response Time: {actual_p95:.2f}ms (Target: <{target_p95}ms) {'‚úÖ' if actual_p95 < target_p95 else '‚ùå'}")
print(f"   Cache Hit Rate: {actual_cache_hit:.2%} (Target: >{target_cache_hit:.0%}) {'‚úÖ' if actual_cache_hit > target_cache_hit else '‚ùå'}")

## 7. Accuracy Metrics Calculation

In [None]:
# Calculate recommendation accuracy metrics
print("üéØ Calculating recommendation accuracy metrics...")

# Simulate ground truth data based on user interactions
# In a real scenario, this would be based on actual click/purchase data
def generate_ground_truth(user_interactions, top_n=10):
    """Generate ground truth recommendations based on user behavior"""
    user_product_matrix = {}
    
    for _, interaction in user_interactions.iterrows():
        user_id = interaction['user_id']
        product_id = interaction['product_id']
        weight = interaction['weight']
        
        if user_id not in user_product_matrix:
            user_product_matrix[user_id] = {}
        
        if product_id not in user_product_matrix[user_id]:
            user_product_matrix[user_id][product_id] = 0
        
        user_product_matrix[user_id][product_id] += weight
    
    # Generate ground truth: top products for each user
    ground_truth = {}
    for user_id, products in user_product_matrix.items():
        sorted_products = sorted(products.items(), key=lambda x: x[1], reverse=True)
        ground_truth[user_id] = [prod_id for prod_id, _ in sorted_products[:top_n]]
    
    return ground_truth

# Generate ground truth from existing interactions
ground_truth = generate_ground_truth(interactions_df)
print(f"üìä Generated ground truth for {len(ground_truth)} users")

# Test recommendation accuracy for users with existing interactions
accuracy_results = []
test_users = list(ground_truth.keys())[:20]  # Test with 20 users

for user_id in test_users:
    try:
        # Get recommendations for this user
        response = requests.get(
            f"{RECOMMENDATION_SERVICE_URL}/recommendations",
            params={'userId': user_id, 'limit': 10, 'algorithm': 'hybrid'},
            timeout=5
        )
        
        if response.status_code == 200:
            data = response.json()
            recommended_products = [rec['product_id'] for rec in data.get('recommendations', [])]
            ground_truth_products = ground_truth.get(user_id, [])
            
            # Calculate metrics
            if len(recommended_products) > 0 and len(ground_truth_products) > 0:
                # Intersection of recommended and ground truth
                intersection = set(recommended_products) & set(ground_truth_products)
                
                # Precision: relevant items retrieved / total items retrieved
                precision = len(intersection) / len(recommended_products) if recommended_products else 0
                
                # Recall: relevant items retrieved / total relevant items
                recall = len(intersection) / len(ground_truth_products) if ground_truth_products else 0
                
                # F1 Score
                f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
                
                # Coverage: how many unique products recommended
                coverage = len(set(recommended_products))
                
                accuracy_results.append({
                    'user_id': user_id,
                    'precision': precision,
                    'recall': recall,
                    'f1_score': f1,
                    'coverage': coverage,
                    'recommended_count': len(recommended_products),
                    'ground_truth_count': len(ground_truth_products),
                    'intersection_count': len(intersection)
                })
                
    except Exception as e:
        logger.error(f"Error calculating accuracy for user {user_id}: {str(e)}")

# Convert to DataFrame
accuracy_df = pd.DataFrame(accuracy_results)

if len(accuracy_df) > 0:
    # Calculate overall metrics
    print(f"\nüìä Accuracy Metrics Summary:")
    print(f"   Users tested: {len(accuracy_df)}")
    print(f"   Average Precision: {accuracy_df['precision'].mean():.3f}")
    print(f"   Average Recall: {accuracy_df['recall'].mean():.3f}")
    print(f"   Average F1-Score: {accuracy_df['f1_score'].mean():.3f}")
    print(f"   Average Coverage: {accuracy_df['coverage'].mean():.1f} products")
    
    # Distribution analysis
    print(f"\nüìà Metrics Distribution:")
    print(accuracy_df[['precision', 'recall', 'f1_score']].describe().round(3))
    
    # Visualize accuracy metrics
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Precision distribution
    accuracy_df['precision'].hist(bins=15, ax=axes[0], alpha=0.7, color='skyblue')
    axes[0].axvline(accuracy_df['precision'].mean(), color='red', linestyle='--', 
                   label=f'Mean: {accuracy_df["precision"].mean():.3f}')
    axes[0].set_title('Precision Distribution')
    axes[0].set_xlabel('Precision')
    axes[0].set_ylabel('Frequency')
    axes[0].legend()
    
    # Recall distribution
    accuracy_df['recall'].hist(bins=15, ax=axes[1], alpha=0.7, color='lightgreen')
    axes[1].axvline(accuracy_df['recall'].mean(), color='red', linestyle='--',
                   label=f'Mean: {accuracy_df["recall"].mean():.3f}')
    axes[1].set_title('Recall Distribution')
    axes[1].set_xlabel('Recall')
    axes[1].set_ylabel('Frequency')
    axes[1].legend()
    
    # F1-Score distribution
    accuracy_df['f1_score'].hist(bins=15, ax=axes[2], alpha=0.7, color='lightcoral')
    axes[2].axvline(accuracy_df['f1_score'].mean(), color='red', linestyle='--',
                   label=f'Mean: {accuracy_df["f1_score"].mean():.3f}')
    axes[2].set_title('F1-Score Distribution')
    axes[2].set_xlabel('F1-Score')
    axes[2].set_ylabel('Frequency')
    axes[2].legend()
    
    plt.tight_layout()
    plt.show()
    
    display(accuracy_df.head())
    
else:
    print("‚ö†Ô∏è No accuracy data available for analysis")

## 8. Generate Evaluation Report

In [None]:
# Generate comprehensive evaluation report
print("üìä Generating Comprehensive Evaluation Report...")
print("=" * 80)

# Report timestamp
report_time = datetime.now()
print(f"üìÖ Report Generated: {report_time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üéØ Service Evaluated: Recommendation Engine (Port 8001)")
print(f"‚è±Ô∏è Evaluation Duration: {(report_time - datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds() / 3600:.2f} hours")

print("\n" + "="*80)
print("üìà EXECUTIVE SUMMARY")
print("="*80)

# Calculate overall scores
if len(success_performance) > 0:
    avg_response_time = success_performance['response_time_ms'].mean()
    p95_response_time = success_performance['response_time_ms'].quantile(0.95)
    cache_hit_rate = success_performance['cached'].mean()
    success_rate = len(success_performance) / len(performance_df)
    
    print(f"‚úÖ Overall Service Health: {'EXCELLENT' if success_rate > 0.95 else 'GOOD' if success_rate > 0.8 else 'NEEDS IMPROVEMENT'}")
    print(f"‚ö° Performance Grade: {'A' if p95_response_time < 300 else 'B' if p95_response_time < 500 else 'C'}")
    print(f"üíæ Cache Efficiency: {'HIGH' if cache_hit_rate > 0.8 else 'MEDIUM' if cache_hit_rate > 0.6 else 'LOW'}")

print("\n" + "="*80)
print("üéØ KEY PERFORMANCE INDICATORS")
print("="*80)

# Performance KPIs
if len(success_performance) > 0:
    print(f"üìä Response Time Metrics:")
    print(f"   ‚Ä¢ Average Response Time: {avg_response_time:.2f} ms")
    print(f"   ‚Ä¢ Median Response Time: {success_performance['response_time_ms'].median():.2f} ms")
    print(f"   ‚Ä¢ 95th Percentile: {p95_response_time:.2f} ms")
    print(f"   ‚Ä¢ 99th Percentile: {success_performance['response_time_ms'].quantile(0.99):.2f} ms")
    
    print(f"\nüíæ Cache Performance:")
    print(f"   ‚Ä¢ Cache Hit Rate: {cache_hit_rate:.2%}")
    if cache_hit_rate > 0:
        cached_avg = success_performance[success_performance['cached']]['response_time_ms'].mean()
        uncached_avg = success_performance[~success_performance['cached']]['response_time_ms'].mean()
        print(f"   ‚Ä¢ Cached Response Time: {cached_avg:.2f} ms")
        print(f"   ‚Ä¢ Uncached Response Time: {uncached_avg:.2f} ms")
        print(f"   ‚Ä¢ Cache Speedup: {uncached_avg / cached_avg:.1f}x faster")
    
    print(f"\nüéØ Service Reliability:")
    print(f"   ‚Ä¢ Success Rate: {success_rate:.2%}")
    print(f"   ‚Ä¢ Total Requests: {len(performance_df)}")
    print(f"   ‚Ä¢ Failed Requests: {len(performance_df) - len(success_performance)}")

# Tracking KPIs
if len(tracking_df) > 0:
    tracking_success_rate = tracking_df['success'].mean()
    print(f"\nüìç Interaction Tracking:")
    print(f"   ‚Ä¢ Tracking Success Rate: {tracking_success_rate:.2%}")
    print(f"   ‚Ä¢ Average Tracking Time: {tracking_df[tracking_df['success']]['response_time_ms'].mean():.2f} ms")
    print(f"   ‚Ä¢ Total Interactions Tracked: {tracking_df['success'].sum()}")

# Accuracy KPIs
if len(accuracy_df) > 0:
    print(f"\nüéØ Recommendation Accuracy:")
    print(f"   ‚Ä¢ Average Precision: {accuracy_df['precision'].mean():.3f}")
    print(f"   ‚Ä¢ Average Recall: {accuracy_df['recall'].mean():.3f}")
    print(f"   ‚Ä¢ Average F1-Score: {accuracy_df['f1_score'].mean():.3f}")
    print(f"   ‚Ä¢ Average Coverage: {accuracy_df['coverage'].mean():.1f} products")

print("\n" + "="*80)
print("üìã STRATEGY PERFORMANCE COMPARISON")
print("="*80)

if len(success_performance) > 0:
    strategy_comparison = success_performance.groupby('strategy').agg({
        'response_time_ms': ['mean', 'median'],
        'cached': 'mean',
        'personalized': 'mean'
    }).round(2)
    
    print("\nStrategy Performance Summary:")
    print(strategy_comparison.to_string())

print("\n" + "="*80)
print("‚úÖ GOALS ACHIEVEMENT ASSESSMENT")
print("="*80)

# Define targets and assess
targets = {
    'response_time_p95': {'target': 500, 'actual': p95_response_time if len(success_performance) > 0 else 0, 'unit': 'ms'},
    'cache_hit_rate': {'target': 0.80, 'actual': cache_hit_rate if len(success_performance) > 0 else 0, 'unit': '%'},
    'success_rate': {'target': 0.95, 'actual': success_rate if len(success_performance) > 0 else 0, 'unit': '%'},
    'tracking_success': {'target': 0.95, 'actual': tracking_success_rate if len(tracking_df) > 0 else 0, 'unit': '%'}
}

for metric, data in targets.items():
    target = data['target']
    actual = data['actual']
    unit = data['unit']
    
    if unit == '%':
        status = '‚úÖ PASS' if actual >= target else '‚ùå FAIL'
        print(f"{metric.replace('_', ' ').title()}: {actual:.2%} (Target: {target:.0%}) {status}")
    else:
        status = '‚úÖ PASS' if actual <= target else '‚ùå FAIL'
        print(f"{metric.replace('_', ' ').title()}: {actual:.2f}{unit} (Target: <{target}{unit}) {status}")

print("\n" + "="*80)
print("üîß RECOMMENDATIONS FOR IMPROVEMENT")
print("="*80)

recommendations = []

# Performance recommendations
if len(success_performance) > 0:
    if p95_response_time > 500:
        recommendations.append("‚ö° Optimize response time: P95 exceeds 500ms target")
    if cache_hit_rate < 0.80:
        recommendations.append("üíæ Improve caching strategy: Cache hit rate below 80%")
    if success_rate < 0.95:
        recommendations.append("üõ†Ô∏è Improve error handling: Success rate below 95%")

# Strategy-specific recommendations
if len(success_performance) > 0:
    slow_strategies = success_performance.groupby('strategy')['response_time_ms'].mean()
    slowest_strategy = slow_strategies.idxmax()
    if slow_strategies[slowest_strategy] > 300:
        recommendations.append(f"üîÑ Optimize '{slowest_strategy}' algorithm: Slowest performing strategy")

# General recommendations
recommendations.extend([
    "üìä Implement A/B testing for different recommendation strategies",
    "üîç Add more detailed user behavior tracking",
    "üéØ Implement click-through rate measurement",
    "üìà Set up automated performance monitoring"
])

for i, rec in enumerate(recommendations, 1):
    print(f"{i}. {rec}")

print("\n" + "="*80)
print("üíæ DATA EXPORT")
print("="*80)

# Save evaluation results
results = {
    'performance_data': performance_df.to_dict('records') if len(performance_df) > 0 else [],
    'tracking_data': tracking_df.to_dict('records') if len(tracking_df) > 0 else [],
    'accuracy_data': accuracy_df.to_dict('records') if len(accuracy_df) > 0 else [],
    'summary_metrics': {
        'avg_response_time': avg_response_time if len(success_performance) > 0 else 0,
        'p95_response_time': p95_response_time if len(success_performance) > 0 else 0,
        'cache_hit_rate': cache_hit_rate if len(success_performance) > 0 else 0,
        'success_rate': success_rate if len(success_performance) > 0 else 0,
        'total_requests': len(performance_df),
        'evaluation_timestamp': report_time.isoformat()
    }
}

# Save to JSON file
import os
os.makedirs('d:/Edu/graduation-project/report/evaluation/results', exist_ok=True)
results_file = f'd:/Edu/graduation-project/report/evaluation/results/recommendation_evaluation_{report_time.strftime("%Y%m%d_%H%M%S")}.json'

with open(results_file, 'w') as f:
    json.dump(results, f, indent=2, default=str)

print(f"üìÅ Evaluation results saved to: {results_file}")

print("\n" + "="*80)
print("üéâ EVALUATION COMPLETED SUCCESSFULLY!")
print("="*80)
print(f"üìä Total Test Cases: {len(performance_df) + len(tracking_df)}")
print(f"‚è±Ô∏è Evaluation Time: {report_time.strftime('%H:%M:%S')}")
print(f"üíæ Results Saved: {results_file}")
print("‚úÖ Ready for production deployment assessment!")