# 🎬 Introduction to Recommendation Systems
## Collaborative Filtering with MovieLens Dataset

**CCO 460 - Inteligencia Artificial**  
**Universidad del Sagrado Corazón**

---

## 📚 Learning Objectives

By the end of this lesson, you will understand:

1. **What** recommendation systems are and why they matter
2. **How** user-based collaborative filtering works (conceptually)
3. **How** item-based collaborative filtering works (conceptually)
4. **Why** these algorithms make good recommendations
5. **When** to use each approach
6. **How** recommendation systems connect to neural networks and statistics

---

## 🎯 What You'll Build

We'll implement a **movie recommendation system** using the MovieLens dataset. Imagine you're building the recommendation engine for Netflix or Amazon Prime!

**Example:**
- **User Jane** has watched: "The Godfather", "Star Wars", "Jaws"
- **Question:** What movies should we recommend to Jane?

---

## 📖 Reference to AI Explorer

This notebook complements the **AI Explorer** platform. For conceptual overviews and additional resources:

- 📌 **Main Page:** `/recommendation-systems` - Introduction and real-world applications
- 📐 **Similarity Metrics:** Explained in the glossary section
- 🧠 **Neural Networks Connection:** How modern deep learning enhances collaborative filtering
- 📊 **Statistics Connection:** Mathematical foundations and statistical principles

---

## 🗂️ Table of Contents

1. [Introduction: The Recommendation Problem](#1.-Introduction)
2. [Setup: Loading MovieLens Data](#2.-Setup)
3. [User-Based Collaborative Filtering](#3.-User-Based-CF)
4. [Item-Based Collaborative Filtering](#4.-Item-Based-CF)
5. [Comparison: User vs Item](#5.-Comparison)
6. [Connections to AI](#6.-Connections)
7. [Glossary](#7.-Glossary)

---

# 1. Introduction: The Recommendation Problem

## 🤔 The Problem

Imagine you run a video streaming service with **10,000 movies**. A new user signs up and watches just 5 movies. **How do you recommend the 6th movie?**

## 💡 The Insight

**Collaborative Filtering** makes recommendations based on patterns in what people like:

### Two Main Approaches:

**1. User-Based: "People like you also liked..."**
- Find users with similar taste to yours
- Recommend movies they enjoyed
- *Amazon uses this*: "Customers who bought this also bought..."

**2. Item-Based: "If you liked this, you'll like..."**
- Find movies similar to ones you liked
- Recommend those similar movies  
- *Netflix uses this*: "Because you watched The Godfather..."

## 🌟 Real-World Impact

- **Netflix:** 75% of viewing comes from recommendations
- **Amazon:** 35% of revenue driven by recommendations
- **Spotify:** 40% of discoveries through personalized playlists

---

# 2. Setup: Loading MovieLens Data

## 📦 About MovieLens Dataset

MovieLens is the industry-standard dataset for learning recommendation systems:
- **Created by:** GroupLens Research (University of Minnesota)
- **Contains:** Movie ratings from real users
- **Files:** `movies.csv` (movie titles) + `ratings.csv` (user ratings)

### Available at:
- https://grouplens.org/datasets/movielens/
- Download the **MovieLens 100K** or **MovieLens 1M** dataset

---

In [None]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

print("✅ Libraries imported successfully!")

In [None]:
# Load the MovieLens dataset
# NOTE: Download movies.csv and ratings.csv from https://grouplens.org/datasets/movielens/
# Place them in the same directory as this notebook

try:
    movies = pd.read_csv('movies.csv')
    ratings = pd.read_csv('ratings.csv')
    print("✅ Data loaded successfully!")
    print(f"\n📊 Dataset Stats:")
    print(f"   - Movies: {len(movies):,}")
    print(f"   - Ratings: {len(ratings):,}")
    print(f"   - Users: {ratings['userId'].nunique():,}")
except FileNotFoundError:
    print("❌ Error: movies.csv or ratings.csv not found!")
    print("   Please download from: https://grouplens.org/datasets/movielens/")

In [None]:
# Examine the data structure
print("🎬 Sample Movies:")
display(movies.head())

print("\n⭐ Sample Ratings:")
display(ratings.head())

## 🗃️ Data Structure Explanation

**movies.csv:**
- `movieId`: Unique identifier for each movie
- `title`: Movie name and year
- `genres`: Categories (Action, Comedy, Drama, etc.)

**ratings.csv:**
- `userId`: Unique identifier for each user
- `movieId`: Which movie was rated
- `rating`: Score from 0.5 to 5.0 (higher = better)
- `timestamp`: When the rating was made

---

In [None]:
# Merge movies and ratings to create our working dataset
data = pd.merge(movies, ratings, on='movieId')

# Create the user-item matrix (the foundation of collaborative filtering)
# Rows = Users, Columns = Movies, Values = Ratings
user_item_matrix = data.pivot_table(
    index='userId',
    columns='title',
    values='rating'
)

print(f"📊 User-Item Matrix Shape: {user_item_matrix.shape}")
print(f"   → {user_item_matrix.shape[0]} users × {user_item_matrix.shape[1]} movies")

# Fill missing values with 0 (user hasn't rated that movie)
user_item_matrix_filled = user_item_matrix.fillna(0)

print("\n✅ User-Item Matrix created!")

## 📐 The User-Item Matrix Explained

This matrix is the **core data structure** for collaborative filtering:

```
          The Godfather | Star Wars | Jaws | Titanic | ...
User 1         5.0      |    4.5    |  3.0 |   0     | ...
User 2         4.0      |    5.0    |  4.5 |   4.0   | ...
User 3         0        |    4.0    |  0   |   5.0   | ...
...
```

**Key Insight:** Most values are 0 (users haven't rated most movies)
- This is called a **sparse matrix**
- Our job: Predict the missing values!

---

In [None]:
# Visualize matrix sparsity
sparsity = 1 - (user_item_matrix.notna().sum().sum() / (user_item_matrix.shape[0] * user_item_matrix.shape[1]))
print(f"🔍 Matrix Sparsity: {sparsity*100:.2f}%")
print(f"   → Most users have rated less than {(1-sparsity)*100:.1f}% of movies!")

# Sample visualization: First 50 users and 50 movies
plt.figure(figsize=(12, 8))
sample = user_item_matrix.iloc[:50, :50].notna().astype(int)
sns.heatmap(sample, cmap='YlOrRd', cbar=True, yticklabels=False, xticklabels=False)
plt.title('User-Item Matrix Visualization\n(Black = No Rating, Color = Has Rating)', fontsize=14)
plt.xlabel('Movies (sample of 50)', fontsize=12)
plt.ylabel('Users (sample of 50)', fontsize=12)
plt.tight_layout()
plt.show()

print("\n💡 Notice: Lots of white space = sparse data!")

---

# 3. User-Based Collaborative Filtering

## 🎯 Core Idea: "People like you also liked..."

### How It Works (Conceptually):

1. **Find similar users** to the target user
2. **Look at what they liked** that the target user hasn't seen
3. **Recommend those movies**

---

## 🤝 Step-by-Step Example: Recommending for User Jane

**Jane's Ratings:**
- The Godfather: ⭐⭐⭐⭐⭐ (5.0)
- Star Wars: ⭐⭐⭐⭐ (4.0)
- Jaws: ⭐⭐⭐⭐ (4.5)

**Step 1: Find Similar Users**

We compare Jane's ratings with all other users using **similarity metrics** (see Glossary):

```
User    | Similarity to Jane | Their Ratings
--------|-------------------|--------------
Bob     | 0.92 (very high!) | Loved same movies + also loved "Pulp Fiction"
Alice   | 0.88 (high!)      | Similar taste + loved "The Shawshank Redemption"
Charlie | 0.12 (low)        | Completely different taste
```

**Step 2: Get Recommendations**

Jane should watch:
1. **Pulp Fiction** (Bob's favorite, high similarity)
2. **The Shawshank Redemption** (Alice's favorite, high similarity)

We **ignore** Charlie's recommendations (low similarity = different taste).

---

## 📐 The Math: Cosine Similarity

**See AI Explorer:** `/recommendation-systems` → Glossary → Cosine Similarity

```
similarity = cos(θ) = (A · B) / (||A|| × ||B||)
```

**Intuitive Explanation:**
- Imagine each user as a vector in "movie rating space"
- Similar users point in similar directions
- Cosine similarity measures the angle between vectors
- Value close to 1 = very similar, close to 0 = not similar

**Why this works:** Users with similar rating patterns have similar taste!

---

In [None]:
# Implementation: User-Based Collaborative Filtering

# Step 1: Calculate user similarity matrix
user_similarity = cosine_similarity(user_item_matrix_filled)
user_similarity_df = pd.DataFrame(
    user_similarity,
    index=user_item_matrix.index,
    columns=user_item_matrix.index
)

print("✅ User similarity matrix calculated!")
print(f"   Shape: {user_similarity_df.shape}")
print("\n📊 Sample similarities for User 1:")
print(user_similarity_df.iloc[0].sort_values(ascending=False).head(10))

In [None]:
def recommend_movies_user_based(user_id, matrix, similarity_matrix, n_recommendations=5):
    """
    Generate movie recommendations using User-Based Collaborative Filtering.
    
    Algorithm:
    1. Find users most similar to target user
    2. Get movies they rated highly that target user hasn't seen
    3. Rank by weighted average (similarity * rating)
    
    Args:
        user_id: Target user
        matrix: User-item rating matrix
        similarity_matrix: User-user similarity scores
        n_recommendations: Number of movies to recommend
        
    Returns:
        List of recommended movie titles
    """
    # Get similarity scores for target user
    user_similarities = similarity_matrix[user_id].sort_values(ascending=False)
    
    # Find top similar users (excluding the user themselves)
    similar_users = user_similarities.iloc[1:11].index  # Top 10 similar users
    
    # Get movies rated by similar users
    similar_users_ratings = matrix.loc[similar_users]
    
    # Calculate weighted average: similarity * rating
    weights = user_similarities.loc[similar_users]
    weighted_ratings = similar_users_ratings.T.dot(weights) / weights.sum()
    
    # Remove movies already rated by target user
    user_rated_movies = matrix.loc[user_id][matrix.loc[user_id] > 0].index
    recommendations = weighted_ratings.drop(user_rated_movies, errors='ignore')
    
    # Return top N recommendations
    return recommendations.sort_values(ascending=False).head(n_recommendations)

# Test the recommendation function
target_user = 1
user_recs = recommend_movies_user_based(
    target_user, 
    user_item_matrix_filled, 
    user_similarity_df,
    n_recommendations=10
)

print(f"🎬 Top 10 Recommendations for User {target_user} (User-Based CF):\n")
for i, (movie, score) in enumerate(user_recs.items(), 1):
    print(f"{i:2d}. {movie[:60]:<60} (Score: {score:.3f})")

## 🎓 Why User-Based CF Works

### The Psychology:
- **Homophily:** "Birds of a feather flock together"
- People with similar backgrounds/interests tend to like similar things
- If someone shares 80% of your taste, they're likely right about the other 20%

### The Statistics:
- **Correlation:** Measures how users' ratings move together
- **K-Nearest Neighbors:** Find the k most similar users
- **Weighted Average:** Give more weight to more similar users

### Strengths:
✓ Explains recommendations ("Users like you...")
✓ Captures changing tastes
✓ Good for diverse item catalogs

### Weaknesses:
✗ Slow with many users (must compare to all users)
✗ Cold start problem (new users have no ratings)
✗ Sparsity issues (most users rate few items)

---

---

# 4. Item-Based Collaborative Filtering

## 🎯 Core Idea: "If you liked this, you'll like..."

### How It Works (Conceptually):

1. **Find movies similar** to ones the user liked
2. **Rank by similarity**
3. **Recommend the most similar unwatched movies**

---

## 🎬 Step-by-Step Example: Recommending Based on "The Godfather"

**User Jane loved "The Godfather" (⭐⭐⭐⭐⭐)**

**Step 1: Find Similar Movies**

We compare "The Godfather" with all other movies based on user ratings:

```
Movie                    | Similarity to Godfather | Why?
-------------------------|------------------------|-------
Godfather Part II        | 0.95 (very high!)      | Same users love both
The Shawshank Redemption | 0.87 (high!)           | Fans of drama overlap
Star Wars                | 0.23 (low)             | Different audience
```

**Step 2: Recommend**

Jane should watch:
1. **Godfather Part II** (almost identical audience)
2. **The Shawshank Redemption** (similar drama fans)

We **ignore** Star Wars (different audience = low similarity).

---

## 📐 The Math: Item Similarity

**See AI Explorer:** `/recommendation-systems` → Glossary → Adjusted Cosine Similarity

Instead of comparing users, we:
1. **Transpose the matrix** (rows=movies, columns=users)
2. **Calculate cosine similarity** between movie vectors
3. Each movie is now represented by how users rated it

**Intuitive Explanation:**
- Two movies are similar if the same users rated them similarly
- "The Godfather" and "Godfather II" → rated by same users with similar scores
- "The Godfather" and "Frozen" → rated by different users = not similar

**Why this works:** Items liked by the same people are probably similar!

---

In [None]:
# Implementation: Item-Based Collaborative Filtering

# Step 1: Calculate item similarity matrix
# Transpose the matrix: now rows=movies, columns=users
item_similarity = cosine_similarity(user_item_matrix_filled.T)
item_similarity_df = pd.DataFrame(
    item_similarity,
    index=user_item_matrix.columns,
    columns=user_item_matrix.columns
)

print("✅ Item similarity matrix calculated!")
print(f"   Shape: {item_similarity_df.shape}")
print("\n📊 Movies most similar to 'The Godfather':")
if 'Godfather, The (1972)' in item_similarity_df.index:
    godfather_sims = item_similarity_df['Godfather, The (1972)'].sort_values(ascending=False)
    for i, (movie, sim) in enumerate(godfather_sims.head(10).items(), 1):
        print(f"{i:2d}. {movie[:60]:<60} (Similarity: {sim:.3f})")
else:
    print("   Note: 'The Godfather' not found in this dataset")

In [None]:
def recommend_movies_item_based(user_id, matrix, similarity_matrix, n_recommendations=5):
    """
    Generate movie recommendations using Item-Based Collaborative Filtering.
    
    Algorithm:
    1. Get all movies rated by target user
    2. For each rated movie, find similar movies
    3. Aggregate similarity scores weighted by user's rating
    4. Recommend highest-scoring unwatched movies
    
    Args:
        user_id: Target user
        matrix: User-item rating matrix
        similarity_matrix: Item-item similarity scores
        n_recommendations: Number of movies to recommend
        
    Returns:
        List of recommended movie titles
    """
    # Get movies rated by user
    user_ratings = matrix.loc[user_id]
    rated_movies = user_ratings[user_ratings > 0]
    
    # Calculate scores for all movies
    scores = pd.Series(0, index=matrix.columns)
    
    for movie, rating in rated_movies.items():
        # Get similar movies
        similar_movies = similarity_matrix[movie]
        
        # Weight by user's rating (loved movies contribute more)
        scores += similar_movies * rating
    
    # Remove already-rated movies
    scores = scores.drop(rated_movies.index, errors='ignore')
    
    # Normalize and return top N
    return scores.sort_values(ascending=False).head(n_recommendations)

# Test the recommendation function
item_recs = recommend_movies_item_based(
    target_user,
    user_item_matrix_filled,
    item_similarity_df,
    n_recommendations=10
)

print(f"🎬 Top 10 Recommendations for User {target_user} (Item-Based CF):\n")
for i, (movie, score) in enumerate(item_recs.items(), 1):
    print(f"{i:2d}. {movie[:60]:<60} (Score: {score:.3f})")

## 🎓 Why Item-Based CF Works

### The Psychology:
- **Transitive preferences:** If you like A and B, and B is similar to C, you'll probably like C
- **Genre consistency:** Action fans like action movies, drama fans like drama
- **Quality signals:** Good movies share similar quality indicators

### The Statistics:
- **Item correlation:** Movies rated similarly by users are related
- **Clustering:** Items naturally cluster by genre, style, quality
- **Stable patterns:** Item relationships change slowly (unlike user tastes)

### Strengths:
✓ **Fast:** Pre-compute item similarities once, reuse for all users
✓ **Scalable:** More users = better similarity estimates
✓ **Stable:** Movie relationships don't change often
✓ **Explainable:** "Because you watched X"

### Weaknesses:
✗ Cold start for new items (no ratings yet)
✗ Limited serendipity (recommends similar, not diverse)
✗ Popularity bias (popular items dominate)

---

---

# 5. Comparison: User-Based vs Item-Based

## 📊 Side-by-Side Comparison

| Aspect | User-Based CF | Item-Based CF |
|--------|--------------|---------------|
| **Core Idea** | Find similar users | Find similar items |
| **Example** | "Users like you..." | "Because you watched..." |
| **Computation** | Compare users | Compare items |
| **Speed** | Slow (many users) | Fast (pre-compute) |
| **Scalability** | Poor with users | Good with items |
| **Best For** | Small # users, diverse items | Large # users, stable items |
| **Diversity** | High (varied tastes) | Lower (similar items) |
| **Cold Start** | Struggles with new users | Struggles with new items |
| **Real Example** | Amazon ("Customers also bought") | Netflix ("Because you watched") |

---

## 🤔 When to Use Which?

### Choose User-Based When:
- Fewer users than items
- User preferences change quickly
- Want to discover diverse content
- Social element is important

### Choose Item-Based When:
- More users than items
- Item catalog is relatively stable
- Performance is critical
- Want consistency in recommendations

### Real-World Choices:
- **Amazon:** Item-based (millions of products, stable relationships)
- **Netflix:** Item-based (thousands of movies, pre-compute overnight)
- **Spotify:** Hybrid (combine both + content features)

---

In [None]:
# Compare recommendations for the same user
print("🔍 COMPARISON: Recommendations for User 1\n")
print("="*80)

print("\n👥 USER-BASED: 'People like you also liked...'\n")
user_recs = recommend_movies_user_based(target_user, user_item_matrix_filled, user_similarity_df, 5)
for i, (movie, score) in enumerate(user_recs.items(), 1):
    print(f"{i}. {movie[:65]}")

print("\n\n🎬 ITEM-BASED: 'Because you watched...'\n")
item_recs = recommend_movies_item_based(target_user, user_item_matrix_filled, item_similarity_df, 5)
for i, (movie, score) in enumerate(item_recs.items(), 1):
    print(f"{i}. {movie[:65]}")

print("\n" + "="*80)
print("\n💡 Notice: Both methods give good recommendations, but rankings differ!")

## 🔄 Hybrid Approaches

**Most real systems combine both methods!**

### Simple Hybrid:
```python
final_score = 0.5 * user_based_score + 0.5 * item_based_score
```

### Intelligent Hybrid:
1. **Use item-based** for most recommendations (fast, scalable)
2. **Use user-based** for discovery and diversity
3. **Combine with content features** (genres, actors, directors)
4. **Apply business rules** (new releases, trending, regional)

### Modern Approaches:
- **Matrix Factorization:** Decompose the user-item matrix (Netflix Prize winner)
- **Deep Learning:** Neural networks learn representations (see Section 6)
- **Context-Aware:** Time, location, device, mood

---

---

# 6. Connections to Broader AI Concepts

## 🧠 Connection to Neural Networks

**See AI Explorer:** `/recommendation-systems` → "Connections to Broader AI Concepts"

### Evolution Timeline:

**1990s - Collaborative Filtering:**
- User-user and item-item similarity
- Cosine, Pearson correlation
- What you learned today!

**2000s - Matrix Factorization:**
```
User-Item Matrix → User Factors × Item Factors
```
- **Netflix Prize (2009):** $1M for 10% improvement
- Latent factors capture hidden patterns
- SVD, ALS, gradient descent

**2010s - Deep Learning:**
```
User → Embedding Layer → Neural Network → Item Score
```
- **Neural Collaborative Filtering (NCF)**
- Learn user and item embeddings
- Capture non-linear interactions

**2020s - Transformers:**
```
User History → Attention Mechanism → Recommendations
```
- Sequential recommendations (what to watch next)
- Context-aware (time, device, mood)
- Multi-modal (text, images, videos)

### How Neural Networks Enhance CF:

**1. Embeddings:**
- Traditional CF: Sparse user-item matrix
- Neural CF: Dense embedding vectors
- **Example:** User = [0.2, 0.8, 0.1, ...] (learned automatically)

**2. Non-linear Patterns:**
- Traditional CF: Linear relationships (dot products)
- Neural CF: Non-linear (neural network layers)
- **Example:** "Loves action" + "Avoids romance" → Complex preference

**3. Multiple Data Sources:**
- Traditional CF: Ratings only
- Neural CF: Ratings + text reviews + images + click behavior
- **Example:** YouTube combines watch history + video content + thumbnails

---

## 📊 Connection to Statistics

**See AI Explorer:** `/recommendation-systems` → "Statistical Foundations"

### Collaborative Filtering IS Statistics!

**1. Correlation & Covariance:**
```
Pearson Correlation = Cov(X, Y) / (σ_X × σ_Y)
```
- Measures how users' ratings move together
- High correlation = similar taste
- **Used in:** User-based CF with Pearson similarity

**2. Regression:**
```
Predicted Rating = β₀ + β₁×similarity + ε
```
- Predict missing ratings from similar users/items
- Weighted average is a form of linear regression
- **Used in:** Rating prediction in both CF approaches

**3. K-Nearest Neighbors (k-NN):**
```
User-Based CF = k-NN in user space
Item-Based CF = k-NN in item space
```
- Find k most similar neighbors
- Aggregate their preferences
- **Pure statistics:** No learning required!

**4. Probability & Bayesian Inference:**
```
P(user likes item | user history) = P(history | likes) × P(likes) / P(history)
```
- Prior: Item popularity
- Evidence: User's rating history
- Posterior: Probability of liking
- **Used in:** Bayesian personalized ranking

### Statistical Challenges:

**Sparsity:**
- Problem: Most users rate <1% of items
- Solution: Matrix factorization, embeddings

**Cold Start:**
- Problem: No data for new users/items
- Solution: Content-based features, hybrid methods

**Bias:**
- Problem: Popular items over-represented
- Solution: Debiasing algorithms, exploration vs exploitation

**Variance:**
- Problem: Different users use rating scales differently
- Solution: Normalize ratings, use Pearson correlation

---

## 🎓 The Big Picture

```
Classical Statistics → Machine Learning → Deep Learning
      ↓                      ↓                  ↓
Correlation,           Matrix             Neural
Distance,           Factorization,    Collaborative
k-NN                Gradient Descent   Filtering
```

**Key Insight:**
- Start with statistics (what you learned today)
- The intuition carries forward to neural approaches
- Modern systems just have more powerful tools for the same goal: **predict preferences**

---

---

# 7. Glossary

## 📚 Key Terminology

**See Also:** AI Explorer → `/recommendation-systems` → Glossary

---

### Core Concepts

**Collaborative Filtering (CF):**
- Making predictions based on patterns in user-item interactions
- "Collaborate" = use information from multiple users
- No need for item content information

**User-Item Matrix:**
- Rows = Users, Columns = Items, Values = Ratings
- Most values are missing (sparse matrix)
- Goal: Predict missing values

**Sparsity:**
- Percentage of missing values in user-item matrix
- Typical: 99%+ sparsity (users rate <1% of items)
- Major challenge for recommendation systems

**Cold Start Problem:**
- New users: No rating history → can't find similar users
- New items: No ratings yet → can't compute similarity
- Solution: Hybrid methods (content + collaborative)

---

### Similarity Metrics

**Cosine Similarity:**
```
cos(θ) = (A · B) / (||A|| × ||B||)
```
- Measures angle between two vectors
- Range: -1 to 1 (0 to 1 for ratings)
- 1 = identical, 0 = orthogonal, -1 = opposite
- **Best for:** Sparse data, magnitude-independent

**Pearson Correlation:**
```
r = Σ(x-x̄)(y-ȳ) / √[Σ(x-x̄)² Σ(y-ȳ)²]
```
- Measures linear correlation
- Accounts for different rating scales
- Range: -1 to 1
- **Best for:** Users with rating bias (harsh/lenient raters)

**Euclidean Distance:**
```
d = √[Σ(xᵢ - yᵢ)²]
```
- Straight-line distance in multi-dimensional space
- Range: 0 to ∞ (smaller = more similar)
- Often converted to similarity: 1 / (1 + distance)
- **Best for:** Dense data, when magnitude matters

**Jaccard Similarity:**
```
J(A,B) = |A ∩ B| / |A ∪ B|
```
- Ratio of intersection to union of sets
- Range: 0 to 1
- **Best for:** Binary data (liked/didn't like, watched/didn't watch)

---

### Evaluation Metrics

**RMSE (Root Mean Square Error):**
```
RMSE = √[Σ(r̂ - r)² / n]
```
- Measures prediction accuracy
- Penalizes large errors heavily
- Lower is better
- **Used for:** Rating prediction accuracy

**MAE (Mean Absolute Error):**
```
MAE = Σ|r̂ - r| / n
```
- Average absolute difference
- More robust to outliers than RMSE
- Lower is better
- **Used for:** Rating prediction when outliers exist

**Precision & Recall:**
```
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
```
- Precision: Of recommended items, how many were relevant?
- Recall: Of relevant items, how many were recommended?
- Trade-off: High precision vs high recall
- **Used for:** Top-N recommendation quality

**Coverage:**
```
Coverage = |Items Recommended| / |Total Items|
```
- Percentage of catalog that can be recommended
- Measures diversity and long-tail coverage
- Higher is better (avoid popularity bias)
- **Used for:** Ensuring diverse recommendations

---

### Advanced Concepts

**Matrix Factorization:**
```
R ≈ U × I^T
```
- Decompose user-item matrix into user factors × item factors
- Latent factors capture hidden patterns
- Netflix Prize winning approach

**Embeddings:**
- Dense vector representations of users/items
- Learned automatically by neural networks
- Similar to word2vec for text
- Example: User = [0.2, 0.8, 0.1, ...]

**Neural Collaborative Filtering (NCF):**
- Replace dot product with neural network
- Learn non-linear user-item interactions
- Combine with content features

**K-Nearest Neighbors (k-NN):**
- Find k most similar users/items
- Aggregate their preferences
- User-based CF is k-NN in user space
- Item-based CF is k-NN in item space

---

## 🔗 Cross-References

For more details on these concepts:

- **AI Explorer Main Page:** `/recommendation-systems`
- **Similarity Metrics:** Glossary section with formulas and use cases
- **Evaluation Metrics:** When to use which metric
- **Neural Networks:** Evolution from CF to deep learning
- **Statistics:** Statistical foundations and challenges

---

---

# 🎓 Summary & Next Steps

## What You Learned Today:

✅ **Core Concepts:**
- Collaborative filtering finds patterns in user-item interactions
- Two main approaches: User-based and Item-based
- Both use similarity metrics to make predictions

✅ **User-Based CF:**
- Find users with similar taste
- Recommend what they liked
- "People like you also liked..."

✅ **Item-Based CF:**
- Find items similar to user's favorites
- Recommend those similar items
- "Because you watched..."

✅ **Connections:**
- Statistics: Correlation, k-NN, regression
- Machine Learning: Matrix factorization, optimization
- Deep Learning: Embeddings, neural networks

---

## 🚀 Next Steps:

**1. Practice:**
- Try different similarity metrics
- Experiment with k (number of neighbors)
- Compare recommendations for different users

**2. Explore AI Explorer:**
- Read the full glossary
- Study the neural networks connection
- Review evaluation metrics

**3. Advanced Topics:**
- Matrix factorization (SVD, ALS)
- Hybrid recommendation systems
- Deep learning approaches (NCF, autoencoders)
- Context-aware recommendations

**4. Real-World Application:**
- Build your own recommendation engine
- Apply to different domains (music, products, news)
- Deploy and evaluate with real users

---

## 📚 Additional Resources:

**Datasets:**
- MovieLens: https://grouplens.org/datasets/movielens/
- Last.fm Music: https://grouplens.org/datasets/hetrec-2011/
- Amazon Product Reviews: http://jmcauley.ucsd.edu/data/amazon/

**Papers:**
- Item-Based CF (2001): Amazon's original paper
- Netflix Prize (2009): Matrix factorization breakthroughs
- NCF (2017): Neural collaborative filtering

**AI Explorer:**
- Full documentation and examples
- Interactive visualizations
- Practice exercises

---

## ✨ Final Thoughts:

Recommendation systems are one of the most **practical** and **impactful** applications of AI:

- They power the platforms you use every day
- They're based on intuitive statistical principles
- They evolve from simple methods to deep learning
- They have real business value (35% of Amazon's revenue!)

**The journey from collaborative filtering to neural recommendations mirrors the evolution of AI itself** - from statistics to machine learning to deep learning, always solving the same fundamental problem: understanding and predicting human preferences.

**Keep exploring, keep learning!** 🚀

---