# Movie Recommendation System

This notebook creates personalized movie recommendations using collaborative filtering based on your Letterboxd ratings and similar users' preferences.

## Features:
- Uses your complete Letterboxd rating history
- Excludes movies you've already watched 
- Gives bonus weighting to movies rated 5/5 stars by similar users
- Finds users with similar taste through cosine similarity

In [66]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity
import re

# Load ratings export dataset
ratings = pd.read_csv('data/ratings_export.csv')

# Robust CSV loading for movie database with multiple fallback methods
try:
    movies = pd.read_csv('data/movie_data.csv', 
                        on_bad_lines='skip',
                        quoting=1,  # QUOTE_ALL
                        engine='python')
    print("✅ Successfully loaded movies data")
except Exception as e:
    print(f"⚠️  First attempt failed, trying alternative method...")
    try:
        movies = pd.read_csv('data/movie_data.csv', 
                            on_bad_lines='skip',
                            sep=',',
                            engine='python',
                            encoding='utf-8',
                            quotechar='"',
                            escapechar='\\')
        print("✅ Successfully loaded movies data with method 2")
    except Exception as e2:
        print(f"⚠️  Loading sample data as fallback...")
        movies = pd.read_csv('data/movie_data.csv', 
                           nrows=10000,
                           on_bad_lines='skip',
                           engine='python')
        print("✅ Successfully loaded sample movies data")

# Merge datasets to create comprehensive user ratings
user_ratings = pd.merge(ratings, movies, left_on='movie_id', right_on='movie_id')

print(f"📊 Dataset Summary:")
print(f"   • {len(ratings):,} user ratings loaded")
print(f"   • {len(movies):,} movies in database")  
print(f"   • {len(user_ratings):,} combined user-movie ratings")

# Movie name cleaning function for matching
def clean_movie_name(name):
    """Clean movie names for better matching"""
    if pd.isna(name) or not isinstance(name, str):
        return ""
    name = re.sub(r'^The\s+', '', name, flags=re.IGNORECASE)
    name = re.sub(r'\s*\([^)]*\)$', '', name)  # Remove year/info in parentheses
    name = re.sub(r'[^\w\s]', '', name)  # Remove special characters
    return name.strip().lower()

✅ Successfully loaded movies data
📊 Dataset Summary:
   • 11,078,167 user ratings loaded
   • 285,963 movies in database
   • 11,079,666 combined user-movie ratings
📊 Dataset Summary:
   • 11,078,167 user ratings loaded
   • 285,963 movies in database
   • 11,079,666 combined user-movie ratings


## Step 1: Load Your Letterboxd Ratings

In [67]:
# Let's use the original clean ratings.csv instead of the processed version
print("Loading original ratings.csv with all your clean movie ratings...")

# Load your original clean ratings
original_ratings = pd.read_csv('data/ratings.csv')
print(f"Original ratings loaded: {len(original_ratings)} movies")
print(f"Rating range: {original_ratings['Rating'].min()} to {original_ratings['Rating'].max()}")
print(f"Movies with ratings > 0: {(original_ratings['Rating'] > 0).sum()}")

# Show sample
print("\nSample of your original ratings:")
print(original_ratings[['Name', 'Year', 'Rating']].head(10))

Loading original ratings.csv with all your clean movie ratings...
Original ratings loaded: 204 movies
Rating range: 0.5 to 5.0
Movies with ratings > 0: 204

Sample of your original ratings:
                         Name  Year  Rating
0                Interstellar  2014     5.0
1        (500) Days of Summer  2009     4.5
2       Friends with Benefits  2011     3.5
3  Terminator 2: Judgment Day  1991     3.5
4               Groundhog Day  1993     4.0
5            The Hunger Games  2012     3.5
6                  About Time  2013     5.0
7                      Barbie  2023     3.0
8                        Dune  2021     4.0
9        John Wick: Chapter 4  2023     3.5


## Step 2: Load and Match Your Watched Movies

In [68]:
# Load your watched movies to exclude them from recommendations
print("Loading your watched movies to exclude from recommendations...")
watched_movies = pd.read_csv('data/watched.csv')
print(f"Total watched movies: {len(watched_movies)}")

# Clean watched movie names for matching
watched_movies['clean_name'] = watched_movies['Name'].apply(clean_movie_name)

# Match watched movies with the database
watched_matched = watched_movies.merge(
    movies_clean_filtered[['tmdb_id', 'movie_title', 'year_released', 'clean_title']], 
    left_on=['clean_name', 'Year'], 
    right_on=['clean_title', 'year_released'], 
    how='inner'
)

print(f"Matched {len(watched_matched)} watched movies with database")

# Also try name-only matching for watched movies
watched_unmatched = watched_movies[~watched_movies.index.isin(watched_matched.index)]
if len(watched_unmatched) > 0:
    watched_name_matches = watched_unmatched.merge(
        movies_clean_filtered[['tmdb_id', 'movie_title', 'year_released', 'clean_title']],
        left_on='clean_name',
        right_on='clean_title',
        how='inner'
    )
    if len(watched_name_matches) > 0:
        watched_matched = pd.concat([watched_matched, watched_name_matches], ignore_index=True)
        print(f"Found {len(watched_name_matches)} additional watched movie matches")

# Create set of all movies you've watched (both rated and unrated)
all_watched_tmdb_ids = set(watched_matched['tmdb_id'].unique())
rated_tmdb_ids = set(my_ratings_final['tmdb_id'].unique())

print(f"Total unique movies you've watched: {len(all_watched_tmdb_ids)}")
print(f"Movies you've rated: {len(rated_tmdb_ids)}")
print(f"Movies watched but not rated: {len(all_watched_tmdb_ids - rated_tmdb_ids)}")

# Sample of watched but not rated movies
unrated_watched = all_watched_tmdb_ids - rated_tmdb_ids
if unrated_watched:
    sample_unrated = list(unrated_watched)[:5]
    print("Sample movies watched but not rated:")
    for tmdb_id in sample_unrated:
        movie_info = movies_clean_filtered[movies_clean_filtered['tmdb_id'] == tmdb_id]
        if len(movie_info) > 0:
            title = movie_info.iloc[0]['movie_title']
            year = movie_info.iloc[0]['year_released']
            print(f"  - {title} ({year})")

Loading your watched movies to exclude from recommendations...
Total watched movies: 426
Matched 392 watched movies with database
Found 70 additional watched movie matches
Total unique movies you've watched: 426
Movies you've rated: 208
Movies watched but not rated: 229
Sample movies watched but not rated:
  - Triple Frontier (2019.0)
  - Neighbors (2014.0)
  - Pitch Perfect 2 (2015.0)
  - Deadpool 2 (2018.0)
  - Fantastic Four (2005.0)


In [69]:
# Now let's match these movies with the movie database using movie names
print("Matching your rated movies with the movie database...")

# First, let's clean the movie names for better matching
import re

def clean_movie_name(name):
    """Clean movie names for better matching"""
    if pd.isna(name) or not isinstance(name, str):
        return ""
    # Remove common prefixes/suffixes and normalize
    name = re.sub(r'^The\s+', '', name, flags=re.IGNORECASE)
    name = re.sub(r'\s*\([^)]*\)$', '', name)  # Remove year/info in parentheses
    name = re.sub(r'[^\w\s]', '', name)  # Remove special characters
    name = name.strip().lower()
    return name

# Clean names in both datasets
original_ratings['clean_name'] = original_ratings['Name'].apply(clean_movie_name)
movies_clean_filtered = movies_clean.dropna(subset=['movie_title', 'year_released'])
movies_clean_filtered['clean_title'] = movies_clean_filtered['movie_title'].apply(clean_movie_name)

# Try matching by cleaned names and year
matched_movies = original_ratings.merge(
    movies_clean_filtered[['tmdb_id', 'movie_title', 'year_released', 'clean_title']], 
    left_on=['clean_name', 'Year'], 
    right_on=['clean_title', 'year_released'], 
    how='inner'
)

print(f"Matched {len(matched_movies)} movies by exact name and year")

# For unmatched movies, try matching by name only (ignore year)
unmatched = original_ratings[~original_ratings.index.isin(matched_movies.index)]
if len(unmatched) > 0:
    print(f"Attempting name-only matching for {len(unmatched)} unmatched movies...")
    
    name_only_matches = unmatched.merge(
        movies_clean_filtered[['tmdb_id', 'movie_title', 'year_released', 'clean_title']],
        left_on='clean_name',
        right_on='clean_title',
        how='inner'
    )
    
    if len(name_only_matches) > 0:
        # Remove the suffixes from the merge
        name_only_matches = name_only_matches.drop(columns=['clean_title'])
        matched_movies = pd.concat([matched_movies, name_only_matches], ignore_index=True)
        print(f"Found {len(name_only_matches)} additional matches by name only")

print(f"\nFinal matching results: {len(matched_movies)} out of {len(original_ratings)} movies matched")
print(f"Match rate: {len(matched_movies)/len(original_ratings)*100:.1f}%")

# Create the final ratings dataset
my_ratings_final = matched_movies[['tmdb_id', 'Rating']].copy()
my_ratings_final['user_id'] = "brimell"

print(f"\nYour final ratings dataset: {len(my_ratings_final)} movies")
print(f"Rating distribution:")
rating_dist = my_ratings_final['Rating'].value_counts().sort_index()
for rating, count in rating_dist.items():
    print(f"  {rating}: {count} movies")

# Show sample matches
print(f"\nSample matched movies:")
sample_matches = matched_movies[['Name', 'movie_title', 'Year', 'year_released', 'Rating']].head(10)
for _, row in sample_matches.iterrows():
    print(f"  {row['Name']} ({row['Year']}) -> {row['movie_title']} ({row['year_released']}) - Rating: {row['Rating']}")

Matching your rated movies with the movie database...
Matched 170 movies by exact name and year
Attempting name-only matching for 34 unmatched movies...
Found 69 additional matches by name only

Final matching results: 239 out of 204 movies matched
Match rate: 117.2%

Your final ratings dataset: 239 movies
Rating distribution:
  0.5: 1 movies
  1.0: 14 movies
  1.5: 6 movies
  2.0: 13 movies
  2.5: 11 movies
  3.0: 20 movies
  3.5: 44 movies
  4.0: 83 movies
  4.5: 35 movies
  5.0: 12 movies

Sample matched movies:
  Interstellar (2014) -> Interstellar (2014.0) - Rating: 5.0
  (500) Days of Summer (2009) -> (500) Days of Summer (2009.0) - Rating: 4.5
  Friends with Benefits (2011) -> Friends with Benefits (2011.0) - Rating: 3.5
  Terminator 2: Judgment Day (1991) -> Terminator 2: Judgment Day (1991.0) - Rating: 3.5
  Groundhog Day (1993) -> Groundhog Day (1993.0) - Rating: 4.0
  The Hunger Games (2012) -> The Hunger Games (2012.0) - Rating: 3.5
  About Time (2013) -> About Time (2013.0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies_clean_filtered['clean_title'] = movies_clean_filtered['movie_title'].apply(clean_movie_name)


## Step 3: Match Ratings with Movie Database

In [70]:
# Now rebuild the recommendation system with the complete dataset
print("Rebuilding recommendation system with your complete rating dataset...")

# Update the variable to use our new complete dataset
my_ratings_updated = my_ratings_final

# Rebuild the user mapping
combined_ratings_new = pd.concat([user_ratings, my_ratings_updated.rename(columns={'Rating': 'rating_val'})])
combined_ratings_new = combined_ratings_new.dropna(subset=['user_id', 'rating_val'])

# Create new mappings
tmdb_id_to_idx_new = {tmdb_id: i for i, tmdb_id in enumerate(combined_ratings_new['tmdb_id'].unique())}
user_id_to_idx_new = {user_id: i + 1 for i, user_id in enumerate(combined_ratings_new['user_id'].unique())}
user_id_to_idx_new["brimell"] = 0

print(f"You now have {len(my_ratings_updated)} rated movies in the system")
print(f"Total combined ratings: {len(combined_ratings_new)}")

# Find similar users
your_movies_new = combined_ratings_new[combined_ratings_new['user_id'] == "brimell"]
common_movies_new = pd.merge(your_movies_new, combined_ratings_new, on='tmdb_id')
common_movies_count_new = common_movies_new.groupby('user_id_y').size()

# Use a lower threshold since we have more movies
min_common = 10
filtered_user_ids_new = common_movies_count_new[common_movies_count_new >= min_common].index
filtered_combined_ratings_new = combined_ratings_new[combined_ratings_new['user_id'].isin(filtered_user_ids_new)]

print(f"Found {len(filtered_user_ids_new)} users who have rated at least {min_common} movies in common")
print(f"Average movies in common: {common_movies_count_new.mean():.1f}")
print(f"Max movies in common: {common_movies_count_new.max()}")

# Create sparse matrix for similarity computation
rows = filtered_combined_ratings_new['user_id'].map(user_id_to_idx_new)
cols = filtered_combined_ratings_new['tmdb_id'].map(tmdb_id_to_idx_new) 
data = filtered_combined_ratings_new['rating_val']
ratings_matrix_new = csr_matrix((data, (rows, cols)), shape=(len(user_id_to_idx_new), len(tmdb_id_to_idx_new)))

print("Computing user similarities...")
user_similarity_new = cosine_similarity(ratings_matrix_new)

# Find most similar users
top_similar_indices_new = np.argsort(-user_similarity_new[0])[1:11]
idx_to_user_new = {v: k for k, v in user_id_to_idx_new.items()}

print("\nTop 10 most similar users:")
for i, idx in enumerate(top_similar_indices_new, 1):
    if idx in idx_to_user_new:
        user_id = idx_to_user_new[idx]
        similarity = user_similarity_new[0][idx]
        print(f"{i:2d}. User: {user_id}, Similarity: {similarity:.4f}")

Rebuilding recommendation system with your complete rating dataset...
You now have 239 rated movies in the system
Total combined ratings: 11079905
You now have 239 rated movies in the system
Total combined ratings: 11079905
Found 7019 users who have rated at least 10 movies in common
Average movies in common: 78.4
Max movies in common: 309
Found 7019 users who have rated at least 10 movies in common
Average movies in common: 78.4
Max movies in common: 309
Computing user similarities...
Computing user similarities...

Top 10 most similar users:
 1. User: bickan, Similarity: 0.3230
 2. User: spchee, Similarity: 0.3105
 3. User: leeyummie, Similarity: 0.3086
 4. User: canasian, Similarity: 0.3081
 5. User: trennison1, Similarity: 0.3067
 6. User: jwill904, Similarity: 0.3065
 7. User: luckyspecialist, Similarity: 0.3058
 8. User: woozy_, Similarity: 0.3032
 9. User: cesourius, Similarity: 0.2977
10. User: dwhite1174, Similarity: 0.2963

Top 10 most similar users:
 1. User: bickan, Similar

## Step 4: Build Collaborative Filtering System

In [71]:
# Generate final movie recommendations with the complete dataset
# EXCLUDING all movies you've watched (both rated and unrated)
# WITH HIGHER WEIGHT for movies rated 5/5 stars (perfect Letterboxd ratings)
print("Generating movie recommendations based on users with similar taste...")
print("Excluding ALL movies you've watched (rated + unrated)...")
print("🌟 BONUS WEIGHT for movies rated 5.0/5.0 (perfect 5-star ratings)!")

# Use ALL watched movies (not just rated ones) for exclusion
my_watched_movies_set = all_watched_tmdb_ids  # This includes both rated and unrated
print(f"Excluding {len(my_watched_movies_set)} movies you've already watched")

recommended_movies_dict = {}

# Use top 200 similar users for better recommendations
top_users_count = min(200, len(top_similar_indices_new))
print(f"Using top {top_users_count} similar users for recommendations")

for idx in top_similar_indices_new[:top_users_count]:
    if idx in idx_to_user_new:
        user_id = idx_to_user_new[idx] 
        user_similarity_score = user_similarity_new[0][idx]
        
        # Get highly rated movies by this user (3.5+ stars equivalent, so 3.5+ on 5 scale)
        user_high_ratings = filtered_combined_ratings_new[
            (filtered_combined_ratings_new['user_id'] == user_id) & 
            (filtered_combined_ratings_new['rating_val'] >= 3.5)
        ]
        
        for _, row in user_high_ratings.iterrows():
            tmdb_id = row['tmdb_id']
            # Exclude movies you've watched (both rated and unrated)
            if tmdb_id not in my_watched_movies_set:
                if tmdb_id not in recommended_movies_dict:
                    recommended_movies_dict[tmdb_id] = {
                        'users': [user_id], 
                        'ratings': [row['rating_val']], 
                        'similarities': [user_similarity_score],
                        'perfect_ratings': 1 if row['rating_val'] == 5.0 else 0  # Track 5.0/5.0 ratings
                    }
                else:
                    recommended_movies_dict[tmdb_id]['users'].append(user_id)
                    recommended_movies_dict[tmdb_id]['ratings'].append(row['rating_val'])
                    recommended_movies_dict[tmdb_id]['similarities'].append(user_similarity_score)
                    if row['rating_val'] == 5.0:
                        recommended_movies_dict[tmdb_id]['perfect_ratings'] += 1

print(f"Found {len(recommended_movies_dict)} potential NEW recommendations")

# Score recommendations with BONUS for 5.0/5.0 ratings (perfect Letterboxd stars)
scored_recommendations = []
perfect_bonus_count = 0

for tmdb_id, data in recommended_movies_dict.items():
    avg_rating = np.mean(data['ratings'])
    avg_similarity = np.mean(data['similarities'])
    num_recommenders = len(data['users'])
    perfect_ratings = data['perfect_ratings']
    
    # Base combined score: rating * similarity * log(count) 
    base_score = avg_rating * avg_similarity * np.log(1 + num_recommenders)
    
    # BONUS for movies with 5.0/5.0 ratings (perfect Letterboxd ratings): 
    # - 1.5x multiplier if any user rated it 5.0/5.0
    # - Additional 0.2x for each perfect rating (diminishing returns)
    perfect_bonus = 1.0
    if perfect_ratings > 0:
        perfect_bonus = 1.5 + (perfect_ratings - 1) * 0.2  # 1.5x for first, 0.2x for each additional
        perfect_bonus_count += 1
    
    combined_score = base_score * perfect_bonus
    
    scored_recommendations.append({
        'tmdb_id': tmdb_id,
        'avg_rating': avg_rating,
        'avg_similarity': avg_similarity, 
        'num_recommenders': num_recommenders,
        'perfect_ratings': perfect_ratings,
        'perfect_bonus': perfect_bonus,
        'combined_score': combined_score
    })

print(f"🌟 {perfect_bonus_count} movies received 5-star bonus weighting!")

# Sort by combined score and get top 50
top_recommendations = sorted(scored_recommendations, key=lambda x: x['combined_score'], reverse=True)[:50]

# Get movie titles
final_recommendations = []
for rec in top_recommendations:
    movie_match = movies_clean_filtered[movies_clean_filtered['tmdb_id'] == rec['tmdb_id']]
    if len(movie_match) > 0:
        title = movie_match.iloc[0]['movie_title']
        year = movie_match.iloc[0]['year_released']
        final_recommendations.append({
            'title': title,
            'year': int(year) if pd.notna(year) else 'Unknown',
            'avg_rating': rec['avg_rating'],
            'avg_similarity': rec['avg_similarity'],
            'num_recommenders': rec['num_recommenders'],
            'perfect_ratings': rec['perfect_ratings'],
            'perfect_bonus': rec['perfect_bonus'],
            'combined_score': rec['combined_score']
        })

print(f"\nTop {len(final_recommendations)} NEW Movie Recommendations (with 5⭐ bonus):")
print("=" * 90)
print("✨ These are movies you HAVEN'T watched yet! ✨")
print("🌟 Movies with 5.0/5.0 ratings get bonus weighting! 🌟")
print("=" * 90)

for i, movie in enumerate(final_recommendations, 1):
    perfect_indicator = f" 🌟×{movie['perfect_ratings']}" if movie['perfect_ratings'] > 0 else ""
    bonus_indicator = f" (×{movie['perfect_bonus']:.1f} bonus)" if movie['perfect_bonus'] > 1.0 else ""
    
    print(f"{i:2d}. {movie['title']} ({movie['year']}){perfect_indicator}")
    print(f"    Avg Rating: {movie['avg_rating']:.2f}/5.0 | Similarity: {movie['avg_similarity']:.3f} | {movie['num_recommenders']} users{bonus_indicator}")
    print()

Generating movie recommendations based on users with similar taste...
Excluding ALL movies you've watched (rated + unrated)...
🌟 BONUS WEIGHT for movies rated 5.0/5.0 (perfect 5-star ratings)!
Excluding 426 movies you've already watched
Using top 10 similar users for recommendations
Found 1183 potential NEW recommendations
🌟 165 movies received 5-star bonus weighting!

Top 50 NEW Movie Recommendations (with 5⭐ bonus):
✨ These are movies you HAVEN'T watched yet! ✨
🌟 Movies with 5.0/5.0 ratings get bonus weighting! 🌟
 1. Batman Begins (2005) 🌟×1
    Avg Rating: 8.11/5.0 | Similarity: 0.308 | 9 users (×1.5 bonus)

 2. There Will Be Blood (2007) 🌟×1
    Avg Rating: 8.71/5.0 | Similarity: 0.308 | 7 users (×1.5 bonus)

 3. Alien (1979) 🌟×1
    Avg Rating: 7.62/5.0 | Similarity: 0.308 | 8 users (×1.5 bonus)

 4. Prisoners (2013) 🌟×1
    Avg Rating: 8.00/5.0 | Similarity: 0.306 | 7 users (×1.5 bonus)

 5. The Dark Knight Rises (2012) 🌟×1
    Avg Rating: 7.38/5.0 | Similarity: 0.308 | 8 users (

## Step 5: Generate Personalized Recommendations

This step generates movie recommendations using the collaborative filtering system with special bonus weighting for movies that similar users rated 5/5 stars.

In [74]:
# Analyze rating distribution to implement extreme weighting
print("📊 Analyzing rating distribution for extreme weighting...")

# Check your rating distribution
your_rating_mean = my_ratings_updated['Rating'].mean()
your_rating_std = my_ratings_updated['Rating'].std()
print(f"Your rating mean: {your_rating_mean:.2f}")
print(f"Your rating std: {your_rating_std:.2f}")

# Check dataset rating distribution  
dataset_rating_mean = filtered_combined_ratings_new['rating_val'].mean()
dataset_rating_std = filtered_combined_ratings_new['rating_val'].std()
print(f"Dataset rating mean: {dataset_rating_mean:.2f}")
print(f"Dataset rating std: {dataset_rating_std:.2f}")

# Define moderate extreme weighting function
def extreme_rating_weight(rating, mean_rating=6.49):
    """
    Apply extreme weighting to ratings:
    - Ratings far from mean get higher weight
    - Ratings close to mean get lower weight
    - Uses moderate polynomial scaling for reasonable results
    """
    deviation = abs(rating - mean_rating)
    # Use moderate polynomial scaling: weight = 1 + (deviation/2)^2
    weight = 1 + (deviation / 2.0) ** 2
    return weight

# Test the weighting function with dataset mean
test_ratings = [0.5, 2.0, 4.0, 5.0, 6.5, 8.0, 10.0]  # Using dataset scale (0.5-10.0)
print(f"\n🎯 Extreme weighting examples (dataset mean = {dataset_rating_mean:.1f}):")
for rating in test_ratings:
    weight = extreme_rating_weight(rating, dataset_rating_mean)
    print(f"Rating {rating:.1f}: Weight = {weight:.2f}x")

📊 Analyzing rating distribution for extreme weighting...
Your rating mean: 3.52
Your rating std: 1.02
Dataset rating mean: 6.49
Dataset rating std: 2.08

🎯 Extreme weighting examples (dataset mean = 6.5):
Rating 0.5: Weight = 9.96x
Rating 2.0: Weight = 6.03x
Rating 4.0: Weight = 2.55x
Rating 5.0: Weight = 1.55x
Rating 6.5: Weight = 1.00x
Rating 8.0: Weight = 1.57x
Rating 10.0: Weight = 4.09x


In [72]:
# Save the complete NEW movie recommendations to file (with 5⭐ bonus weighting)
filename = 'data/movie_recommendations_NEW_with_5star_bonus.txt'
with open(filename, 'w') as f:
    f.write("🎬 NEW MOVIE RECOMMENDATIONS (UNWATCHED) with 5⭐ BONUS\n")
    f.write("=" * 70 + "\n")
    f.write(f"Based on {len(my_ratings_updated)} of your movie ratings\n")
    f.write(f"Excluding {len(my_watched_movies_set)} movies you've already watched\n")
    f.write(f"Analyzed {len(filtered_user_ids_new)} users with similar taste\n")
    f.write(f"🌟 BONUS WEIGHTING for movies rated 5.0/5.0 (perfect Letterboxd ratings)!\n")
    f.write(f"Generated on: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
    f.write("✨ These are movies you HAVEN'T watched yet! ✨\n\n")
    
    for i, movie in enumerate(final_recommendations, 1):
        perfect_indicator = f" 🌟×{movie['perfect_ratings']}" if movie['perfect_ratings'] > 0 else ""
        bonus_text = f" (×{movie['perfect_bonus']:.1f} bonus)" if movie['perfect_bonus'] > 1.0 else ""
        
        f.write(f"{i:2d}. {movie['title']} ({movie['year']}){perfect_indicator}\n")
        f.write(f"    ⭐ Rating: {movie['avg_rating']:.2f}/5.0\n")
        f.write(f"    👥 Recommended by: {movie['num_recommenders']} similar users\n")
        f.write(f"    🎯 Similarity Score: {movie['avg_similarity']:.3f}\n")
        f.write(f"    📊 Combined Score: {movie['combined_score']:.2f}{bonus_text}\n")
        if movie['perfect_ratings'] > 0:
            f.write(f"    🌟 Perfect 5.0/5.0 ratings: {movie['perfect_ratings']}\n")
        f.write("\n")

print(f"✅ Saved {len(final_recommendations)} NEW recommendations (with 5⭐ bonus) to '{filename}'")

# Count movies with perfect ratings in final recommendations
perfect_movies_in_recs = sum(1 for movie in final_recommendations if movie['perfect_ratings'] > 0)
total_perfect_ratings = sum(movie['perfect_ratings'] for movie in final_recommendations)

# Updated summary statistics
print(f"\n📈 UPDATED RECOMMENDATION SYSTEM SUMMARY (with 5⭐ bonus):")
print(f"━" * 70)
print(f"Your Ratings:              {len(my_ratings_updated)} movies (0.5-5.0 Letterboxd scale)")
print(f"Total Movies Watched:      {len(my_watched_movies_set)} movies (rated + unrated)")
print(f"Movies Rated:             {len(rated_tmdb_ids)} movies")  
print(f"Movies Watched (No Rating): {len(my_watched_movies_set - rated_tmdb_ids)} movies")
print(f"Similar Users:             {len(filtered_user_ids_new)} users found")
print(f"Avg Movies Shared:         {common_movies_count_new.mean():.1f} movies per user")
print(f"Top Similarity:            {user_similarity_new[0][top_similar_indices_new[0]]:.3f}")
print(f"NEW Recommendations:       {len(final_recommendations)} movies you haven't watched")
print(f"🌟 Movies with 5⭐ bonus:   {perfect_movies_in_recs} movies ({total_perfect_ratings} perfect ratings)")
print(f"Database Match Rate:       {len(watched_matched)/len(watched_movies)*100:.1f}% (watched movies)")
print(f"\n🎯 All recommendations are movies you've NEVER seen before!")
print(f"🌟 Movies rated 5.0/5.0 by similar users get 1.5x-2.5x bonus weighting!")
print(f"📊 Perfect 5-star ratings in dataset: {five_star_ratings:,} ({five_star_ratings/total_ratings*100:.2f}%)")

✅ Saved 50 NEW recommendations (with 5⭐ bonus) to 'data/movie_recommendations_NEW_with_5star_bonus.txt'

📈 UPDATED RECOMMENDATION SYSTEM SUMMARY (with 5⭐ bonus):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Your Ratings:              239 movies (0.5-5.0 Letterboxd scale)
Total Movies Watched:      426 movies (rated + unrated)
Movies Rated:             208 movies
Movies Watched (No Rating): 229 movies
Similar Users:             7019 users found
Avg Movies Shared:         78.4 movies per user
Top Similarity:            0.323
NEW Recommendations:       50 movies you haven't watched
🌟 Movies with 5⭐ bonus:   32 movies (36 perfect ratings)
Database Match Rate:       108.5% (watched movies)

🎯 All recommendations are movies you've NEVER seen before!
🌟 Movies rated 5.0/5.0 by similar users get 1.5x-2.5x bonus weighting!
📊 Perfect 5-star ratings in dataset: 1,110,093 (10.06%)


## Step 6: Save Results and Summary