## Case Study: Building a Movie Recommendation System

This case study demonstrates how to develop a recommendation system using the **MovieLens Dataset**. We explore collaborative filtering, content-based filtering, and hybrid models to recommend movies based on user preferences and behavior.

### Dataset Overview

The **MovieLens Dataset** is a popular benchmark for evaluating recommendation systems. It contains:
- **Users**: Over 600 users with unique IDs.
- **Movies**: Over 9,000 movies with metadata such as title, genres, and release year.
- **Ratings**: User-movie interactions, where each rating is a value from 1 to 5.

The dataset can be downloaded from [MovieLens](https://grouplens.org/datasets/movielens/).

## Step 1: Data Preparation

In [ ]:
import pandas as pd

# Load the dataset
ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')

# Merge datasets for a complete view
data = pd.merge(ratings, movies, on='movieId')

# Display the first few rows
print(data.head())

### Data Cleaning and Transformation

In [ ]:
# Check for missing values
print(data.isnull().sum())

# Normalize ratings between 0 and 1
data['rating_normalized'] = data['rating'] / data['rating'].max()

## Step 2: Collaborative Filtering

Collaborative filtering predicts a user’s preferences by analyzing the behavior of similar users or items.

In [ ]:
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Create a user-item matrix
user_item_matrix = data.pivot_table(index='userId', columns='title', values='rating').fillna(0)

# Apply Singular Value Decomposition (SVD)
svd = TruncatedSVD(n_components=50, random_state=42)
latent_matrix = svd.fit_transform(user_item_matrix)

# Compute cosine similarity between users
user_similarity = cosine_similarity(latent_matrix)

# Recommend movies for a specific user
def recommend_movies(user_id, top_n=5):
    user_index = user_id - 1  # Adjust for 0-based indexing
    similar_users = np.argsort(-user_similarity[user_index])  # Sort users by similarity
    similar_user_index = similar_users[1]  # Skip self
    similar_user_ratings = user_item_matrix.iloc[similar_user_index]
    recommended_movies = similar_user_ratings[similar_user_ratings > 0].sort_values(ascending=False).head(top_n)
    return recommended_movies.index

# Get recommendations for user 1
print(f"Recommendations for User 1: {recommend_movies(1)}")

## Step 3: Content-Based Filtering

Content-based filtering recommends movies similar to those a user has already liked by analyzing item attributes.

In [ ]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Create a genre matrix using CountVectorizer
vectorizer = CountVectorizer()
genre_matrix = vectorizer.fit_transform(movies['genres'])

# Compute cosine similarity between movies
genre_similarity = cosine_similarity(genre_matrix)

# Recommend movies based on similarity to a given movie
def recommend_similar_movies(movie_title, top_n=5):
    movie_index = movies[movies['title'] == movie_title].index[0]
    similar_movies = np.argsort(-genre_similarity[movie_index])
    recommended_movies = movies.iloc[similar_movies[1:top_n+1]]  # Skip the input movie
    return recommended_movies['title']

# Get similar movies to "Toy Story (1995)"
print(f"Movies similar to 'Toy Story (1995)': {recommend_similar_movies('Toy Story (1995)')}")

## Step 4: Hybrid Model

Hybrid models combine collaborative and content-based filtering for improved recommendations.

In [ ]:
# Combine collaborative filtering and content-based scores
def hybrid_recommendation(user_id, movie_title, top_n=5):
    # Collaborative filtering recommendations
    collaborative_recs = recommend_movies(user_id, top_n)
    
    # Content-based recommendations
    content_recs = recommend_similar_movies(movie_title, top_n)
    
    # Combine recommendations with weights
    combined_recs = list(set(collaborative_recs) | set(content_recs))
    return combined_recs[:top_n]

# Hybrid recommendation for User 1 and "Toy Story (1995)"
print(f"Hybrid Recommendations: {hybrid_recommendation(1, 'Toy Story (1995)')}")

## Step 5: Evaluation

### Evaluation Metrics

In [ ]:
from sklearn.metrics import mean_absolute_error

# Calculate MAE for collaborative filtering
user_ratings = user_item_matrix.loc[1]
predicted_ratings = svd.inverse_transform(latent_matrix)[0]
mae = mean_absolute_error(user_ratings[user_ratings > 0], predicted_ratings[user_ratings > 0])
print(f"Mean Absolute Error (MAE): {mae}")

## Step 6: Deployment and Applications

### Deployment Options:
- **Batch Recommendations**: Precompute recommendations for all users and store them for fast retrieval.
- **Real-Time Recommendations**: Generate recommendations dynamically using pre-trained models and updated user data.

### Applications:
1. **Streaming Services**: Personalized movie or song recommendations.
2. **E-Commerce**: Product recommendations based on browsing or purchase history.
3. **Education Platforms**: Course recommendations tailored to user learning progress.

## Summary and Recommendations

### Summary:
- **Collaborative Filtering**: Suitable for platforms with abundant user-item interaction data. Matrix factorization (e.g., SVD) effectively handles sparse datasets.
- **Content-Based Filtering**: Useful for new platforms or items with rich metadata. It relies on item attributes, making it less dependent on user interaction data.
- **Hybrid Models**: Combine the strengths of collaborative and content-based filtering, mitigating the cold start problem and improving accuracy.

### Recommendations:
- Use **collaborative filtering** for platforms with a large user base and extensive interaction data.
- Leverage **content-based filtering** for platforms with rich item metadata or in cases of cold start for new items.
- Adopt **hybrid models** for comprehensive recommendations, balancing personalization and cold start challenges.

By following these steps, practitioners can build effective recommendation systems to enhance user experience, drive engagement, and improve retention.