# üß¥ Evaluasi Algoritma Rekomendasi Skincare

## üìã Tujuan Evaluasi
Notebook ini akan mengevaluasi performa algoritma rekomendasi skincare menggunakan:
- **Content-Based Filtering (CBF)** dengan TF-IDF
- **K-Nearest Neighbors (KNN)** enhancement
- **Hybrid System** CBF + Aggressive KNN

## üìä Metrik Evaluasi
- **Precision@K**: Akurasi rekomendasi
- **Recall@K**: Kelengkapan rekomendasi
- **F1-Score@K**: Keseimbangan precision dan recall
- **NDCG@K**: Kualitas ranking
- **MAP**: Mean Average Precision

---

## üöÄ Langkah 1: Setup Environment

Pertama, kita akan menginstall dan mengimport semua library yang diperlukan.

In [None]:
# Install required libraries
!pip install numpy pandas scikit-learn matplotlib seaborn

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler
import random
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)
random.seed(42)

print("‚úÖ Libraries imported successfully!")
print(f"üì¶ NumPy version: {np.__version__}")
print(f"üì¶ Pandas version: {pd.__version__}")

### ‚úÖ **Checkpoint 1**: Environment Ready!
Jika tidak ada error di atas, environment sudah siap untuk digunakan.

## üìä Langkah 2: Persiapan Data

Kita akan membuat dataset sintetis untuk simulasi evaluasi.

In [None]:
# Generate synthetic product data
def generate_product_data(n_products=50):
    """
    Generate synthetic skincare product data
    """
    # Skincare categories and ingredients
    categories = ['cleanser', 'moisturizer', 'serum', 'sunscreen', 'toner', 'mask']
    skin_types = ['oily', 'dry', 'combination', 'sensitive', 'normal']
    concerns = ['acne', 'aging', 'brightening', 'hydration', 'pores', 'dark_spots']
    ingredients = ['niacinamide', 'hyaluronic_acid', 'vitamin_c', 'retinol', 'salicylic_acid', 
                  'ceramide', 'peptides', 'glycolic_acid', 'zinc', 'tea_tree']
    
    products = []
    for i in range(n_products):
        # Random product attributes
        category = np.random.choice(categories)
        suitable_skin = np.random.choice(skin_types, size=np.random.randint(1, 3), replace=False)
        target_concerns = np.random.choice(concerns, size=np.random.randint(1, 3), replace=False)
        product_ingredients = np.random.choice(ingredients, size=np.random.randint(2, 5), replace=False)
        
        # Create description
        description = f"{category} for {' '.join(suitable_skin)} skin targeting {' '.join(target_concerns)} with {' '.join(product_ingredients)}"
        
        products.append({
            'product_id': f'P{i+1:03d}',
            'name': f'{category.title()} {i+1}',
            'category': category,
            'skin_type': ' '.join(suitable_skin),
            'concerns': ' '.join(target_concerns),
            'ingredients': ' '.join(product_ingredients),
            'description': description,
            'price': np.random.randint(50, 500) * 1000,  # Price in IDR
            'rating': round(np.random.uniform(3.5, 5.0), 1)
        })
    
    return pd.DataFrame(products)

# Generate products
products_df = generate_product_data(50)
print(f"‚úÖ Generated {len(products_df)} products")
print("\nüìã Sample products:")
products_df.head()

In [None]:
# Generate synthetic user data
def generate_user_data(n_users=20):
    """
    Generate synthetic user preference data
    """
    skin_types = ['oily', 'dry', 'combination', 'sensitive', 'normal']
    concerns = ['acne', 'aging', 'brightening', 'hydration', 'pores', 'dark_spots']
    
    users = []
    for i in range(n_users):
        # Random user preferences
        user_skin_type = np.random.choice(skin_types)
        user_concerns = np.random.choice(concerns, size=np.random.randint(1, 4), replace=False)
        
        # Create user profile description
        profile = f"{user_skin_type} skin with concerns about {' '.join(user_concerns)}"
        
        users.append({
            'user_id': f'U{i+1:03d}',
            'skin_type': user_skin_type,
            'concerns': ' '.join(user_concerns),
            'profile': profile,
            'age': np.random.randint(18, 50),
            'budget': np.random.choice(['low', 'medium', 'high'])
        })
    
    return pd.DataFrame(users)

# Generate users
users_df = generate_user_data(20)
print(f"‚úÖ Generated {len(users_df)} users")
print("\nüë• Sample users:")
users_df.head()

### ‚úÖ **Checkpoint 2**: Data Ready!
Dataset sintetis telah dibuat:
- **50 produk skincare** dengan berbagai kategori dan ingredients
- **20 pengguna** dengan preferensi yang beragam

## ü§ñ Langkah 3: Implementasi Algoritma Rekomendasi

Sekarang kita akan mengimplementasikan sistem rekomendasi hybrid.

In [None]:
class SkincareRecommendationSystem:
    def __init__(self, products_df, users_df):
        self.products_df = products_df.copy()
        self.users_df = users_df.copy()
        self.tfidf_vectorizer = None
        self.product_vectors = None
        self.user_vectors = None
        
    def prepare_features(self):
        """
        Prepare TF-IDF features for products and users
        """
        # Combine all text features for products
        self.products_df['combined_features'] = (
            self.products_df['category'] + ' ' +
            self.products_df['skin_type'] + ' ' +
            self.products_df['concerns'] + ' ' +
            self.products_df['ingredients']
        )
        
        # Combine all text features for users
        self.users_df['combined_features'] = (
            self.users_df['skin_type'] + ' ' +
            self.users_df['concerns']
        )
        
        # Create TF-IDF vectors
        all_features = list(self.products_df['combined_features']) + list(self.users_df['combined_features'])
        
        self.tfidf_vectorizer = TfidfVectorizer(
            stop_words='english',
            max_features=1000,
            ngram_range=(1, 2)
        )
        
        all_vectors = self.tfidf_vectorizer.fit_transform(all_features)
        
        # Split vectors back to products and users
        n_products = len(self.products_df)
        self.product_vectors = all_vectors[:n_products]
        self.user_vectors = all_vectors[n_products:]
        
        print(f"‚úÖ TF-IDF features prepared: {self.product_vectors.shape[1]} features")
    
    def content_based_filtering(self, user_idx, top_k=10):
        """
        Content-Based Filtering using TF-IDF and cosine similarity
        """
        if self.product_vectors is None:
            self.prepare_features()
        
        # Calculate similarity between user and all products
        user_vector = self.user_vectors[user_idx]
        similarities = cosine_similarity(user_vector, self.product_vectors).flatten()
        
        # Get top-k products
        top_indices = similarities.argsort()[-top_k:][::-1]
        
        results = []
        for idx in top_indices:
            results.append({
                'product_id': self.products_df.iloc[idx]['product_id'],
                'name': self.products_df.iloc[idx]['name'],
                'similarity_score': similarities[idx],
                'product_idx': idx
            })
        
        return results
    
    def find_similar_users(self, user_idx, k=5):
        """
        Find K most similar users using cosine similarity
        """
        if self.user_vectors is None:
            self.prepare_features()
        
        # Calculate similarity between users
        user_vector = self.user_vectors[user_idx]
        similarities = cosine_similarity(user_vector, self.user_vectors).flatten()
        
        # Exclude self and get top-k
        similarities[user_idx] = -1  # Exclude self
        top_k_indices = similarities.argsort()[-k:][::-1]
        
        return top_k_indices, similarities[top_k_indices]
    
    def aggressive_knn_scoring(self, cbf_scores, similar_users, k):
        """
        Apply aggressive KNN scoring with K-dependent parameters
        """
        # K-dependent parameters
        if k <= 3:
            position_multiplier = 0.3
            exclusivity_bonus = 0.4
            exclusion_penalty = 0.6
        elif k <= 5:
            position_multiplier = 0.25
            exclusivity_bonus = 0.3
            exclusion_penalty = 0.4
        else:
            position_multiplier = 0.2
            exclusivity_bonus = 0.2
            exclusion_penalty = 0.2
        
        enhanced_scores = cbf_scores.copy()
        
        # Simulate KNN recommendations for similar users
        for i, similar_user_idx in enumerate(similar_users):
            # Get CBF recommendations for similar user
            similar_user_recs = self.content_based_filtering(similar_user_idx, top_k=len(cbf_scores))
            
            for j, rec in enumerate(similar_user_recs):
                product_idx = rec['product_idx']
                
                if j < k:  # Product is in top-K for similar user
                    # Position-based boost
                    position_boost = position_multiplier * (k - j) / k
                    
                    # Exclusivity bonus (if product appears in multiple similar users' top-K)
                    exclusivity = exclusivity_bonus * (i + 1) / len(similar_users)
                    
                    # Apply boost
                    boost = position_boost + exclusivity
                    enhanced_scores[product_idx] += boost
                else:
                    # Exclusion penalty for products not in top-K
                    penalty = exclusion_penalty * 0.1
                    enhanced_scores[product_idx] = max(0, enhanced_scores[product_idx] - penalty)
        
        return enhanced_scores
    
    def hybrid_recommend(self, user_idx, k=5, top_n=10):
        """
        Hybrid recommendation combining CBF and aggressive KNN
        """
        # Get CBF recommendations
        cbf_results = self.content_based_filtering(user_idx, top_k=len(self.products_df))
        
        # Extract scores
        cbf_scores = np.zeros(len(self.products_df))
        for result in cbf_results:
            cbf_scores[result['product_idx']] = result['similarity_score']
        
        # Find similar users
        similar_users, user_similarities = self.find_similar_users(user_idx, k)
        
        # Apply aggressive KNN scoring
        final_scores = self.aggressive_knn_scoring(cbf_scores, similar_users, k)
        
        # Get top-N recommendations
        top_indices = final_scores.argsort()[-top_n:][::-1]
        
        recommendations = []
        for idx in top_indices:
            recommendations.append({
                'product_id': self.products_df.iloc[idx]['product_id'],
                'name': self.products_df.iloc[idx]['name'],
                'cbf_score': cbf_scores[idx],
                'final_score': final_scores[idx],
                'product_idx': idx
            })
        
        return recommendations, similar_users

# Initialize recommendation system
rec_system = SkincareRecommendationSystem(products_df, users_df)
print("‚úÖ Recommendation system initialized!")

### ‚úÖ **Checkpoint 3**: Algorithm Ready!
Sistem rekomendasi hybrid telah diimplementasikan dengan:
- **Content-Based Filtering** menggunakan TF-IDF
- **K-Nearest Neighbors** enhancement
- **Aggressive KNN scoring** dengan parameter adaptif

## üéØ Langkah 4: Test Sistem Rekomendasi

Mari kita test sistem dengan contoh pengguna.

In [None]:
# Test with sample user
test_user_idx = 0
test_user = users_df.iloc[test_user_idx]

print(f"üßë‚Äçüíº Testing with User: {test_user['user_id']}")
print(f"üë§ Profile: {test_user['profile']}")
print("\n" + "="*60)

# Test with different K values
for k in [3, 5, 7]:
    print(f"\nüîç Testing with K={k}:")
    recommendations, similar_users = rec_system.hybrid_recommend(test_user_idx, k=k, top_n=5)
    
    print(f"\nüìä Top 5 Recommendations (K={k}):")
    for i, rec in enumerate(recommendations, 1):
        print(f"{i}. {rec['name']} (Score: {rec['final_score']:.3f})")
    
    print(f"\nüë• Similar Users: {[f'U{idx+1:03d}' for idx in similar_users]}")
    print("-" * 40)

### ‚úÖ **Checkpoint 4**: System Working!
Sistem rekomendasi berhasil memberikan rekomendasi dengan berbagai nilai K.

## üìã Langkah 5: Generate Ground Truth

Kita perlu membuat ground truth untuk evaluasi.

In [None]:
def generate_ground_truth(users_df, products_df):
    """
    Generate ground truth based on user-product matching logic
    """
    ground_truth = {}
    
    for user_idx, user in users_df.iterrows():
        user_skin_type = user['skin_type']
        user_concerns = set(user['concerns'].split())
        
        relevant_products = []
        
        for prod_idx, product in products_df.iterrows():
            # Check skin type compatibility
            product_skin_types = set(product['skin_type'].split())
            skin_match = user_skin_type in product_skin_types or 'normal' in product_skin_types
            
            # Check concern overlap
            product_concerns = set(product['concerns'].split())
            concern_overlap = len(user_concerns.intersection(product_concerns)) > 0
            
            # Product is relevant if it matches skin type AND addresses concerns
            if skin_match and concern_overlap:
                # Calculate relevance score based on overlap
                overlap_ratio = len(user_concerns.intersection(product_concerns)) / len(user_concerns)
                
                relevant_products.append({
                    'product_idx': prod_idx,
                    'product_id': product['product_id'],
                    'relevance_score': overlap_ratio
                })
        
        # Sort by relevance score and take top products
        relevant_products.sort(key=lambda x: x['relevance_score'], reverse=True)
        
        ground_truth[user_idx] = {
            'relevant_products': [p['product_idx'] for p in relevant_products],
            'relevance_scores': {p['product_idx']: p['relevance_score'] for p in relevant_products}
        }
    
    return ground_truth

# Generate ground truth
ground_truth = generate_ground_truth(users_df, products_df)

print("‚úÖ Ground truth generated!")
print(f"\nüìä Ground Truth Statistics:")
for user_idx in range(min(5, len(users_df))):
    n_relevant = len(ground_truth[user_idx]['relevant_products'])
    print(f"User {user_idx+1}: {n_relevant} relevant products")

### ‚úÖ **Checkpoint 5**: Ground Truth Ready!
Ground truth telah dibuat berdasarkan kecocokan skin type dan concerns.

## üìä Langkah 6: Implementasi Metrik Evaluasi

Sekarang kita akan mengimplementasikan berbagai metrik evaluasi.

In [None]:
class RecommendationEvaluator:
    def __init__(self, rec_system, ground_truth):
        self.rec_system = rec_system
        self.ground_truth = ground_truth
    
    def precision_at_k(self, recommended_items, relevant_items, k):
        """
        Calculate Precision@K
        """
        if k == 0:
            return 0.0
        
        recommended_k = recommended_items[:k]
        relevant_set = set(relevant_items)
        
        hits = len([item for item in recommended_k if item in relevant_set])
        return hits / k
    
    def recall_at_k(self, recommended_items, relevant_items, k):
        """
        Calculate Recall@K
        """
        if len(relevant_items) == 0:
            return 0.0
        
        recommended_k = recommended_items[:k]
        relevant_set = set(relevant_items)
        
        hits = len([item for item in recommended_k if item in relevant_set])
        return hits / len(relevant_items)
    
    def f1_score_at_k(self, recommended_items, relevant_items, k):
        """
        Calculate F1-Score@K
        """
        precision = self.precision_at_k(recommended_items, relevant_items, k)
        recall = self.recall_at_k(recommended_items, relevant_items, k)
        
        if precision + recall == 0:
            return 0.0
        
        return 2 * (precision * recall) / (precision + recall)
    
    def ndcg_at_k(self, recommended_items, relevant_items, relevance_scores, k):
        """
        Calculate NDCG@K
        """
        def dcg_at_k(scores, k):
            scores = np.array(scores[:k])
            if scores.size == 0:
                return 0.0
            return scores[0] + np.sum(scores[1:] / np.log2(np.arange(2, scores.size + 1)))
        
        # Get relevance scores for recommended items
        recommended_k = recommended_items[:k]
        actual_scores = [relevance_scores.get(item, 0.0) for item in recommended_k]
        
        # Calculate DCG
        dcg = dcg_at_k(actual_scores, k)
        
        # Calculate IDCG (ideal DCG)
        ideal_scores = sorted([relevance_scores.get(item, 0.0) for item in relevant_items], reverse=True)
        idcg = dcg_at_k(ideal_scores, k)
        
        if idcg == 0:
            return 0.0
        
        return dcg / idcg
    
    def average_precision(self, recommended_items, relevant_items):
        """
        Calculate Average Precision
        """
        if len(relevant_items) == 0:
            return 0.0
        
        relevant_set = set(relevant_items)
        hits = 0
        sum_precisions = 0.0
        
        for i, item in enumerate(recommended_items):
            if item in relevant_set:
                hits += 1
                precision_at_i = hits / (i + 1)
                sum_precisions += precision_at_i
        
        return sum_precisions / len(relevant_items)
    
    def evaluate_user(self, user_idx, k_neighbors, k_values=[3, 5, 7, 10]):
        """
        Evaluate recommendations for a single user
        """
        # Get recommendations
        recommendations, _ = self.rec_system.hybrid_recommend(user_idx, k=k_neighbors, top_n=max(k_values))
        recommended_items = [rec['product_idx'] for rec in recommendations]
        
        # Get ground truth
        relevant_items = self.ground_truth[user_idx]['relevant_products']
        relevance_scores = self.ground_truth[user_idx]['relevance_scores']
        
        # Calculate metrics for different K values
        metrics = {}
        for k in k_values:
            metrics[k] = {
                'precision': self.precision_at_k(recommended_items, relevant_items, k),
                'recall': self.recall_at_k(recommended_items, relevant_items, k),
                'f1_score': self.f1_score_at_k(recommended_items, relevant_items, k),
                'ndcg': self.ndcg_at_k(recommended_items, relevant_items, relevance_scores, k)
            }
        
        # Calculate MAP
        map_score = self.average_precision(recommended_items, relevant_items)
        
        return metrics, map_score
    
    def evaluate_all_users(self, k_neighbors, k_values=[3, 5, 7, 10]):
        """
        Evaluate recommendations for all users
        """
        all_metrics = {k: {'precision': [], 'recall': [], 'f1_score': [], 'ndcg': []} for k in k_values}
        all_map_scores = []
        
        for user_idx in range(len(self.rec_system.users_df)):
            user_metrics, map_score = self.evaluate_user(user_idx, k_neighbors, k_values)
            
            for k in k_values:
                for metric in ['precision', 'recall', 'f1_score', 'ndcg']:
                    all_metrics[k][metric].append(user_metrics[k][metric])
            
            all_map_scores.append(map_score)
        
        # Calculate averages
        avg_metrics = {}
        for k in k_values:
            avg_metrics[k] = {}
            for metric in ['precision', 'recall', 'f1_score', 'ndcg']:
                avg_metrics[k][metric] = np.mean(all_metrics[k][metric])
        
        avg_map = np.mean(all_map_scores)
        
        return avg_metrics, avg_map

# Initialize evaluator
evaluator = RecommendationEvaluator(rec_system, ground_truth)
print("‚úÖ Evaluator initialized!")

### ‚úÖ **Checkpoint 6**: Metrics Ready!
Semua metrik evaluasi telah diimplementasikan:
- **Precision@K, Recall@K, F1-Score@K**
- **NDCG@K** untuk kualitas ranking
- **MAP** untuk performa keseluruhan

## üîç Langkah 7: Eksekusi Evaluasi Lengkap

Sekarang kita akan menjalankan evaluasi lengkap dengan berbagai nilai K.

In [None]:
# Run full evaluation
k_neighbors_list = [3, 5, 7, 10]
k_values = [3, 5, 7, 10]

print("üîç Running Full Evaluation...")
print("=" * 50)

evaluation_results = {}

for k_neighbors in k_neighbors_list:
    print(f"\nüìä Evaluating with K_neighbors = {k_neighbors}")
    
    avg_metrics, avg_map = evaluator.evaluate_all_users(k_neighbors, k_values)
    
    evaluation_results[k_neighbors] = {
        'metrics': avg_metrics,
        'map': avg_map
    }
    
    print(f"MAP: {avg_map:.3f}")
    
    # Display results for each K
    for k in k_values:
        metrics = avg_metrics[k]
        print(f"  K={k}: P={metrics['precision']:.3f}, R={metrics['recall']:.3f}, F1={metrics['f1_score']:.3f}, NDCG={metrics['ndcg']:.3f}")

print("\n‚úÖ Evaluation completed!")

In [None]:
# Create comprehensive results summary
print("üìä COMPREHENSIVE EVALUATION RESULTS")
print("=" * 60)

# Create results DataFrame
results_data = []
for k_neighbors in k_neighbors_list:
    for k in k_values:
        metrics = evaluation_results[k_neighbors]['metrics'][k]
        results_data.append({
            'K_neighbors': k_neighbors,
            'K_eval': k,
            'Precision': metrics['precision'],
            'Recall': metrics['recall'],
            'F1_Score': metrics['f1_score'],
            'NDCG': metrics['ndcg'],
            'MAP': evaluation_results[k_neighbors]['map']
        })

results_df = pd.DataFrame(results_data)

print("\nüìã Detailed Results Table:")
print(results_df.round(3))

# Find best performing configurations
print("\nüèÜ BEST PERFORMING CONFIGURATIONS:")
print("-" * 40)

for metric in ['F1_Score', 'Precision', 'Recall', 'NDCG']:
    best_idx = results_df[metric].idxmax()
    best_config = results_df.iloc[best_idx]
    print(f"Best {metric}: K_neighbors={best_config['K_neighbors']}, K_eval={best_config['K_eval']}, Score={best_config[metric]:.3f}")

best_map_k = max(evaluation_results.keys(), key=lambda x: evaluation_results[x]['map'])
print(f"Best MAP: K_neighbors={best_map_k}, Score={evaluation_results[best_map_k]['map']:.3f}")

### ‚úÖ **Checkpoint 7**: Evaluation Complete!
Evaluasi lengkap telah selesai dengan berbagai kombinasi K values.

## üìà Langkah 8: Visualisasi dan Analisis Hasil

Mari kita visualisasikan dan analisis hasil evaluasi.

In [None]:
# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

# Create comprehensive visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('üß¥ Skincare Recommendation Algorithm Evaluation Results', fontsize=16, fontweight='bold')

metrics_to_plot = ['Precision', 'Recall', 'F1_Score', 'NDCG']
k_eval_values = [3, 5, 7, 10]

# Plot 1: Precision@K for different K_neighbors
ax1 = axes[0, 0]
for k_neighbors in k_neighbors_list:
    precision_values = [evaluation_results[k_neighbors]['metrics'][k]['precision'] for k in k_eval_values]
    ax1.plot(k_eval_values, precision_values, marker='o', linewidth=2, label=f'K_neighbors={k_neighbors}')
ax1.set_title('Precision@K', fontweight='bold')
ax1.set_xlabel('K (Evaluation)')
ax1.set_ylabel('Precision')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Recall@K for different K_neighbors
ax2 = axes[0, 1]
for k_neighbors in k_neighbors_list:
    recall_values = [evaluation_results[k_neighbors]['metrics'][k]['recall'] for k in k_eval_values]
    ax2.plot(k_eval_values, recall_values, marker='s', linewidth=2, label=f'K_neighbors={k_neighbors}')
ax2.set_title('Recall@K', fontweight='bold')
ax2.set_xlabel('K (Evaluation)')
ax2.set_ylabel('Recall')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: F1-Score@K for different K_neighbors
ax3 = axes[0, 2]
for k_neighbors in k_neighbors_list:
    f1_values = [evaluation_results[k_neighbors]['metrics'][k]['f1_score'] for k in k_eval_values]
    ax3.plot(k_eval_values, f1_values, marker='^', linewidth=2, label=f'K_neighbors={k_neighbors}')
ax3.set_title('F1-Score@K', fontweight='bold')
ax3.set_xlabel('K (Evaluation)')
ax3.set_ylabel('F1-Score')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: NDCG@K for different K_neighbors
ax4 = axes[1, 0]
for k_neighbors in k_neighbors_list:
    ndcg_values = [evaluation_results[k_neighbors]['metrics'][k]['ndcg'] for k in k_eval_values]
    ax4.plot(k_eval_values, ndcg_values, marker='d', linewidth=2, label=f'K_neighbors={k_neighbors}')
ax4.set_title('NDCG@K', fontweight='bold')
ax4.set_xlabel('K (Evaluation)')
ax4.set_ylabel('NDCG')
ax4.legend()
ax4.grid(True, alpha=0.3)

# Plot 5: MAP for different K_neighbors
ax5 = axes[1, 1]
map_values = [evaluation_results[k]['map'] for k in k_neighbors_list]
bars = ax5.bar([str(k) for k in k_neighbors_list], map_values, color='skyblue', alpha=0.7)
ax5.set_title('Mean Average Precision (MAP)', fontweight='bold')
ax5.set_xlabel('K_neighbors')
ax5.set_ylabel('MAP')
ax5.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bar, value in zip(bars, map_values):
    ax5.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.001, 
             f'{value:.3f}', ha='center', va='bottom', fontweight='bold')

# Plot 6: Heatmap of F1-Scores
ax6 = axes[1, 2]
f1_matrix = np.zeros((len(k_neighbors_list), len(k_eval_values)))
for i, k_neighbors in enumerate(k_neighbors_list):
    for j, k_eval in enumerate(k_eval_values):
        f1_matrix[i, j] = evaluation_results[k_neighbors]['metrics'][k_eval]['f1_score']

im = ax6.imshow(f1_matrix, cmap='YlOrRd', aspect='auto')
ax6.set_title('F1-Score Heatmap', fontweight='bold')
ax6.set_xlabel('K (Evaluation)')
ax6.set_ylabel('K_neighbors')
ax6.set_xticks(range(len(k_eval_values)))
ax6.set_xticklabels(k_eval_values)
ax6.set_yticks(range(len(k_neighbors_list)))
ax6.set_yticklabels(k_neighbors_list)

# Add text annotations
for i in range(len(k_neighbors_list)):
    for j in range(len(k_eval_values)):
        ax6.text(j, i, f'{f1_matrix[i, j]:.3f}', ha='center', va='center', 
                color='white' if f1_matrix[i, j] > 0.5 else 'black', fontweight='bold')

plt.colorbar(im, ax=ax6, shrink=0.8)
plt.tight_layout()
plt.show()

print("üìä Visualization completed!")

In [None]:
# Detailed Performance Analysis
print("üîç DETAILED PERFORMANCE ANALYSIS")
print("=" * 50)

# K-Sensitivity Analysis
print("\nüìà K-SENSITIVITY ANALYSIS:")
print("-" * 30)

for k_neighbors in k_neighbors_list:
    f1_scores = [evaluation_results[k_neighbors]['metrics'][k]['f1_score'] for k in k_eval_values]
    f1_variance = np.var(f1_scores)
    f1_mean = np.mean(f1_scores)
    
    print(f"\nK_neighbors = {k_neighbors}:")
    print(f"  üìä F1-Score Mean: {f1_mean:.3f}")
    print(f"  üìä F1-Score Variance: {f1_variance:.6f}")
    
    if f1_variance < 0.001:
        stability = "Very Stable"
    elif f1_variance < 0.005:
        stability = "Stable"
    else:
        stability = "Sensitive"
    
    print(f"  üéØ Stability: {stability}")

# Performance Ranking
print("\nüèÜ PERFORMANCE RANKING:")
print("-" * 25)

# Rank by average F1-Score across all K values
k_performance = []
for k_neighbors in k_neighbors_list:
    avg_f1 = np.mean([evaluation_results[k_neighbors]['metrics'][k]['f1_score'] for k in k_eval_values])
    k_performance.append((k_neighbors, avg_f1))

k_performance.sort(key=lambda x: x[1], reverse=True)

for rank, (k_neighbors, avg_f1) in enumerate(k_performance, 1):
    print(f"{rank}. K_neighbors = {k_neighbors}: Avg F1-Score = {avg_f1:.3f}")

# Best configuration summary
best_k = k_performance[0][0]
best_metrics = evaluation_results[best_k]['metrics']

print(f"\nüéØ RECOMMENDED CONFIGURATION:")
print(f"   üîß Optimal K_neighbors: {best_k}")
print(f"   üìä Performance at K=5:")
print(f"      ‚Ä¢ Precision@5: {best_metrics[5]['precision']:.3f}")
print(f"      ‚Ä¢ Recall@5: {best_metrics[5]['recall']:.3f}")
print(f"      ‚Ä¢ F1-Score@5: {best_metrics[5]['f1_score']:.3f}")
print(f"      ‚Ä¢ NDCG@5: {best_metrics[5]['ndcg']:.3f}")
print(f"   üìä MAP: {evaluation_results[best_k]['map']:.3f}")

### ‚úÖ **Checkpoint 8**: Analysis Complete!
Visualisasi dan analisis mendalam telah selesai:
- **Grafik performa** untuk semua metrik
- **K-sensitivity analysis** untuk stabilitas
- **Performance ranking** dan rekomendasi konfigurasi optimal

## üìã Langkah 9: Kesimpulan dan Rekomendasi

Mari kita buat kesimpulan akhir dari evaluasi yang telah dilakukan.

In [None]:
# Final Conclusions and Recommendations
print("üìã KESIMPULAN AKHIR EVALUASI ALGORITMA")
print("=" * 60)

# Find overall best performing K
best_k = max(evaluation_results.keys(), 
            key=lambda x: np.mean([evaluation_results[x]['metrics'][k]['f1_score'] for k in k_eval_values]))
best_metrics = evaluation_results[best_k]['metrics']

print(f"\nüèÜ HASIL TERBAIK:")
print(f"   üéØ Nilai K Optimal: K={best_k}")
print(f"   üìä F1-Score@5: {best_metrics[5]['f1_score']:.3f}")
print(f"   üìä Precision@5: {best_metrics[5]['precision']:.3f}")
print(f"   üìä Recall@5: {best_metrics[5]['recall']:.3f}")
print(f"   üìä NDCG@5: {best_metrics[5]['ndcg']:.3f}")
print(f"   üìä MAP: {evaluation_results[best_k]['map']:.3f}")

# Performance interpretation
print(f"\nüí° INTERPRETASI HASIL:")
f1_score = best_metrics[5]['f1_score']
if f1_score >= 0.7:
    performance_level = "Sangat Baik"
    performance_desc = "Algoritma menunjukkan performa yang sangat baik"
elif f1_score >= 0.5:
    performance_level = "Baik"
    performance_desc = "Algoritma menunjukkan performa yang baik dan dapat diandalkan"
elif f1_score >= 0.3:
    performance_level = "Cukup"
    performance_desc = "Algoritma menunjukkan performa yang cukup, masih dapat ditingkatkan"
else:
    performance_level = "Perlu Perbaikan"
    performance_desc = "Algoritma memerlukan perbaikan signifikan"

print(f"   üìà Level Performa: {performance_level}")
print(f"   üìù Deskripsi: {performance_desc}")

# Algorithm characteristics
precision = best_metrics[5]['precision']
recall = best_metrics[5]['recall']
precision_recall_ratio = precision / recall if recall > 0 else 0

print(f"\nüîç KARAKTERISTIK ALGORITMA:")
if precision_recall_ratio > 1.2:
    print(f"   üéØ Algoritma cenderung lebih akurat (precision-focused)")
    print(f"   üíº Cocok untuk aplikasi yang mengutamakan akurasi rekomendasi")
elif precision_recall_ratio < 0.8:
    print(f"   üìä Algoritma cenderung lebih komprehensif (recall-focused)")
    print(f"   üíº Cocok untuk aplikasi yang mengutamakan kelengkapan rekomendasi")
else:
    print(f"   ‚öñÔ∏è Algoritma memiliki keseimbangan yang baik antara akurasi dan kelengkapan")
    print(f"   üíº Cocok untuk aplikasi umum yang membutuhkan performa seimbang")

print(f"\nüöÄ REKOMENDASI IMPLEMENTASI:")
print(f"   ‚Ä¢ Gunakan K_neighbors = {best_k} untuk hasil optimal")
print(f"   ‚Ä¢ Monitor performa secara berkala")
print(f"   ‚Ä¢ Validasi dengan data real pengguna")
print(f"   ‚Ä¢ Pertimbangkan A/B testing untuk validasi")

print(f"\n" + "=" * 60)
print("üéâ Evaluasi algoritma rekomendasi skincare selesai!")
print("üìä Semua metrik telah dievaluasi dan dianalisis.")
print("üéØ Rekomendasi konfigurasi optimal telah diberikan.")

### ‚úÖ **Checkpoint 9**: Evaluation Complete!

üéâ **Selamat!** Anda telah berhasil menyelesaikan evaluasi algoritma rekomendasi skincare secara lengkap!

## üìä **Ringkasan Pencapaian:**
- ‚úÖ **Environment Setup** - Library dan dependencies siap
- ‚úÖ **Data Generation** - Dataset sintetis 50 produk & 20 pengguna
- ‚úÖ **Algorithm Implementation** - Hybrid CBF+KNN system
- ‚úÖ **Ground Truth Creation** - Data referensi untuk evaluasi
- ‚úÖ **Metrics Implementation** - Precision, Recall, F1, NDCG, MAP
- ‚úÖ **Full Evaluation** - Testing dengan berbagai nilai K
- ‚úÖ **Visualization & Analysis** - Grafik dan insight mendalam
- ‚úÖ **Conclusions & Recommendations** - Panduan implementasi

## üéØ **Hasil Utama:**
- **Nilai K optimal** telah diidentifikasi
- **Karakteristik algoritma** telah dianalisis
- **Panduan implementasi** telah disediakan
- **Metrik performa** telah dievaluasi secara komprehensif

## üöÄ **Langkah Selanjutnya:**
1. **Implementasi** sistem dengan nilai K yang direkomendasikan
2. **Testing** dengan data real pengguna
3. **Monitoring** performa di production
4. **Iterasi** berdasarkan feedback pengguna

---
**Terima kasih telah mengikuti evaluasi ini! üôè**

## üìö Appendix: Informasi Tambahan

### üîç **Penjelasan Metrik Evaluasi:**

**Precision@K**: Mengukur proporsi item relevan dalam K rekomendasi teratas
- Formula: `Relevant items in top-K / K`
- Interpretasi: Seberapa akurat rekomendasi yang diberikan

**Recall@K**: Mengukur proporsi item relevan yang berhasil ditangkap dalam K rekomendasi
- Formula: `Relevant items in top-K / Total relevant items`
- Interpretasi: Seberapa lengkap rekomendasi menangkap item relevan

**F1-Score@K**: Harmonic mean dari Precision dan Recall
- Formula: `2 * (Precision * Recall) / (Precision + Recall)`
- Interpretasi: Keseimbangan antara akurasi dan kelengkapan

**NDCG@K**: Normalized Discounted Cumulative Gain
- Mengukur kualitas ranking dengan mempertimbangkan posisi
- Interpretasi: Seberapa baik urutan rekomendasi

**MAP**: Mean Average Precision
- Rata-rata dari Average Precision semua pengguna
- Interpretasi: Performa keseluruhan sistem rekomendasi

### ü§ñ **Cara Kerja Algoritma Hybrid CBF+KNN:**

1. **Content-Based Filtering**: Menghitung similarity antara profil pengguna dan produk menggunakan TF-IDF
2. **K-Nearest Neighbors**: Mencari K pengguna terdekat berdasarkan kesamaan preferensi
3. **Hybrid Scoring**: Menggabungkan skor CBF dengan boost dari KNN
4. **Ranking**: Mengurutkan produk berdasarkan skor akhir

### üìä **Interpretasi Nilai K:**

- **K kecil (3-5)**: Lebih selektif, fokus pada pengguna yang sangat mirip
- **K sedang (5-7)**: Keseimbangan antara selektivitas dan diversitas
- **K besar (7-10)**: Lebih inklusif, mempertimbangkan lebih banyak pengguna

### ‚ö†Ô∏è **Limitasi Evaluasi:**

1. **Dataset Sintetis**: Hasil mungkin berbeda dengan data real
2. **Ground Truth Sederhana**: Menggunakan aturan matching yang basic
3. **Skala Kecil**: 50 produk dan 20 pengguna untuk demo
4. **Tanpa Temporal Factor**: Tidak mempertimbangkan perubahan preferensi waktu

### üîß **Tips Implementasi Production:**

1. **Scalability**: Gunakan approximate nearest neighbor untuk dataset besar
2. **Real-time**: Implementasi caching untuk response time yang cepat
3. **Cold Start**: Siapkan strategi untuk pengguna/produk baru
4. **Feedback Loop**: Implementasi implicit/explicit feedback collection
5. **A/B Testing**: Validasi performa dengan eksperimen terkontrol