# Letterboxd Movie Recommender ‚Äî Notebook

This notebook is a **complete, runnable starter** for the Letterboxd recommender project. It follows the architecture: Data load ‚Üí Cleaning ‚Üí TF-IDF (content) ‚Üí LDA ‚Üí Sentiment ‚Üí Collaborative (SVD/item-item) ‚Üí Hybrid recommendation. 

Place your CSV files in `letterboxd-movie-ratings-data/` with filenames:
- `movie_data.csv`
- `ratings_export.csv`
- `users_export.csv`

Run the cells in order. The notebook contains fallbacks so it runs even if some optional packages are missing.

## 1) Setup ‚Äî Install dependencies (run once)

Run the following cell to install required packages. If you are on Colab, uncomment the pip installs.


In [1]:
import pandas as pd
import numpy as np
from scipy.sparse import load_npz, vstack
import joblib
import os

In [2]:
merged = pd.read_csv("merged_movies_ratings_users.csv")
movies = pd.read_csv("clean_movies.csv")

In [3]:
print("merged",
      merged.columns,
      "\nmovies.columns:",
      movies.columns,)

merged Index(['movie_id', 'username', 'movie_name', 'genres', 'rating', 'description',
       'year'],
      dtype='object') 
movies.columns: Index(['movie_id', 'movie_name', 'genres', 'description', 'year', 'popularity',
       'vote_average', 'vote_count'],
      dtype='object')


In [4]:
merged.head()

Unnamed: 0,movie_id,username,movie_name,genres,rating,description,year
0,feast-2014,deathproof,Feast,"[""Animation"",""Comedy"",""Drama"",""Family""]",7,This Oscar-winning animated short film tells t...,2014.0
1,loving-2016,deathproof,Loving,"[""Romance"",""Drama""]",7,"The story of Richard and Mildred Loving, an in...",2016.0
2,scripted-content,deathproof,Scripted Content,"[""Comedy""]",7,A very short film for Vogue starring Jessica C...,2014.0
3,the-future,deathproof,The Future,"[""Drama"",""Fantasy"",""Romance""]",4,When a couple decides to adopt a stray cat the...,2011.0
4,mank,deathproof,Mank,"[""Drama"",""History""]",5,1930s Hollywood is reevaluated through the eye...,2020.0


## 9 LOad models (optional)


In [5]:
topk_indices = np.load('topk_data/topk_indices.npy')
topk_scores = np.load('topk_data/topk_scores.npy')
print("Top-K shapes:", topk_indices.shape, topk_scores.shape)

Top-K shapes: (285963, 10000) (285963, 10000)


In [6]:
print("topk_indices",
      topk_scores,
      "\ntopk_scores",
      topk_indices)

topk_indices [[0.3175046  0.2466452  0.24578768 ... 0.02833881 0.0283383  0.02833725]
 [0.44509855 0.41655144 0.31313947 ... 0.02684237 0.02684129 0.02683994]
 [0.51007384 0.39329883 0.38868892 ... 0.02717405 0.02717189 0.0271715 ]
 ...
 [0.69548976 0.47076145 0.47064832 ... 0.         0.         0.        ]
 [0.6949353  0.62749654 0.58891606 ... 0.01979467 0.01979269 0.01978976]
 [0.49202397 0.47031456 0.43653703 ... 0.         0.         0.        ]] 
topk_scores [[236856 238306 207249 ... 185477 260989 234206]
 [252069  98307 116085 ...  76861  66098 243644]
 [ 96408 241070 228775 ... 272778 188872 161629]
 ...
 [159538 234413 163953 ...  92946  92947  92948]
 [ 85836 190023  80402 ... 203198  45521  52017]
 [ 29335  60155 248170 ...  92220  92219  92218]]


In [7]:
# Load SVD collaborative model
collab_model = joblib.load('svd_model.pkl')

In [8]:
collab_model

{'type': 'surprise_svd',
 'model': <surprise.prediction_algorithms.matrix_factorization.SVD at 0x24493de9d20>}

In [9]:
fused_batches_dir = 'fused_batches'
batch_files = sorted([os.path.join(fused_batches_dir, f) for f in os.listdir(fused_batches_dir) if f.endswith('.npz')])

In [10]:
batch_files

['fused_batches\\fused_batch_0.npz',
 'fused_batches\\fused_batch_1.npz',
 'fused_batches\\fused_batch_10.npz',
 'fused_batches\\fused_batch_11.npz',
 'fused_batches\\fused_batch_12.npz',
 'fused_batches\\fused_batch_13.npz',
 'fused_batches\\fused_batch_14.npz',
 'fused_batches\\fused_batch_15.npz',
 'fused_batches\\fused_batch_16.npz',
 'fused_batches\\fused_batch_17.npz',
 'fused_batches\\fused_batch_18.npz',
 'fused_batches\\fused_batch_19.npz',
 'fused_batches\\fused_batch_2.npz',
 'fused_batches\\fused_batch_20.npz',
 'fused_batches\\fused_batch_21.npz',
 'fused_batches\\fused_batch_22.npz',
 'fused_batches\\fused_batch_23.npz',
 'fused_batches\\fused_batch_24.npz',
 'fused_batches\\fused_batch_25.npz',
 'fused_batches\\fused_batch_26.npz',
 'fused_batches\\fused_batch_27.npz',
 'fused_batches\\fused_batch_28.npz',
 'fused_batches\\fused_batch_3.npz',
 'fused_batches\\fused_batch_4.npz',
 'fused_batches\\fused_batch_5.npz',
 'fused_batches\\fused_batch_6.npz',
 'fused_batches\\fu

In [11]:
final_batches = [load_npz(f) for f in batch_files]
final_matrix = vstack(final_batches)

In [12]:
final_batches

[<Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 396536 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 399151 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 399840 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 399799 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 400621 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 395996 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 400712 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 398417 stored elements and shape (10000, 100011)>,
 <Compressed Sparse Row sparse matrix of dtype 'float64'
 	with 401200 stored el

In [13]:
final_matrix

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 11056299 stored elements and shape (285963, 100011)>

## 10) Hybrid recommendation function (content + collaborative)

In [14]:
print("merged.columns:", merged.columns)

merged.columns: Index(['movie_id', 'username', 'movie_name', 'genres', 'rating', 'description',
       'year'],
      dtype='object')


In [15]:
print("movies.columns:", movies.columns)

movies.columns: Index(['movie_id', 'movie_name', 'genres', 'description', 'year', 'popularity',
       'vote_average', 'vote_count'],
      dtype='object')


In [27]:
def recommend_for_user_topk(username, merged_df, movies_df, topk_indices, topk_scores, collab_model, 
                            top_n=10, weight_collab=0.5, weight_popularity=0.3):
    # 1Ô∏è‚É£ Get user‚Äôs ratings
    user_ratings = merged_df[merged_df['username'].astype(str).str.lower() == str(username).lower()]
    if user_ratings.empty:
        print("User not found or no ratings.")
        return pd.DataFrame()

    seen = set(user_ratings['movie_id'].unique())
    candidates = movies_df[~movies_df['movie_id'].isin(seen)].copy()
    if candidates.empty:
        return pd.DataFrame()

    # 2Ô∏è‚É£ Content-based score (from Top-K similarity)
    movie_index = {m: i for i, m in enumerate(movies_df['movie_id'].tolist())}
    candidate_indices = [movie_index[m] for m in candidates['movie_id'] if m in movie_index]

    content_scores = []
    for idx in candidate_indices:
        sims = topk_scores[idx]
        content_scores.append(float(np.mean(sims)))
    content_scores = np.array(content_scores)

    # 3Ô∏è‚É£ Collaborative filtering score
    collab_scores = np.zeros(len(candidates), dtype=float)
    if collab_model['type'] == 'surprise_svd':
        algo = collab_model['model']
        for i, mid in enumerate(candidates['movie_id']):
            try:
                pred = algo.predict(uid=username, iid=mid)
                collab_scores[i] = pred.est
            except:
                collab_scores[i] = 0.0
    else:
        sim_matrix = collab_model['sim_matrix']
        id_to_index = collab_model['id_to_index']
        seen_indices = [id_to_index.get(m) for m in seen if id_to_index.get(m) is not None]
        for i, mid in enumerate(candidates['movie_id']):
            idx = id_to_index.get(mid)
            if idx is None or not seen_indices:
                collab_scores[i] = 0.0
            else:
                sims = sim_matrix[idx, seen_indices]
                collab_scores[i] = float(np.mean(sims)) if len(sims) > 0 else 0.0

    # 4Ô∏è‚É£ Normalization helper
    def norm(a):
        a = np.array(a, dtype=float)
        if a.max() == a.min():
            return np.zeros_like(a)
        return (a - a.min()) / (a.max() - a.min())

    c1 = norm(content_scores)
    c2 = norm(collab_scores)

    # 5Ô∏è‚É£ Popularity boost (Letterboxd)
    if 'vote_average' in candidates.columns and 'vote_count' in candidates.columns:
        popularity_boost = np.log1p(candidates['vote_count'].fillna(0)) * candidates['vote_average'].fillna(0)
        c3 = norm(popularity_boost)
    else:
        c3 = np.zeros(len(candidates))

    # 6Ô∏è‚É£ Combine all scores
    #   final = (1 - weight_collab - weight_popularity)*content + weight_collab*collab + weight_popularity*popularity
    w_content = 1 - weight_collab - weight_popularity
    final = w_content * c1 + weight_collab * c2 + weight_popularity * c3

    # 7Ô∏è‚É£ Prepare results
    candidates = candidates.reset_index(drop=True)
    candidates['content_score'] = c1
    candidates['collab_score'] = c2
    candidates['popularity_score'] = c3
    candidates['final_score'] = final

    # 8Ô∏è‚É£ Return top-N with movie_name, year, rating info
    cols = ['movie_name','description', 'year', 'vote_average', 'vote_count', 
            'content_score', 'collab_score', 'popularity_score', 'final_score']
    for c in cols:
        if c not in candidates.columns:
            candidates[c] = None

    return candidates.sort_values('final_score', ascending=False).head(top_n)[cols]


In [28]:
example_user = merged['username'].dropna().astype(str).iloc[0]
recos = recommend_for_user_topk(example_user, merged, movies, topk_indices, topk_scores, collab_model, top_n=10, weight_collab=0.7)
if not recos.empty:
    display(recos)

Unnamed: 0,movie_name,description,year,vote_average,vote_count,content_score,collab_score,popularity_score,final_score
84473,The Empire Strikes Back,empire strike back empire strike back epic sag...,1980.0,8.4,13524.0,0.229719,0.868782,0.954879,0.894611
192647,The Silence of the Lambs,silence lamb silence lamb clarice starling top...,1991.0,8.3,12567.0,0.234908,0.865171,0.936233,0.886489
6058,Life Is Beautiful,life beautiful life beautiful touching story i...,1997.0,8.5,10571.0,0.293478,0.832868,0.941225,0.865375
197622,Wild Tales,wild tale wild tale six deadly story explore e...,2014.0,7.9,2467.0,0.256807,0.915707,0.73744,0.862227
156930,V for Vendetta,vendetta vendetta world great britain become f...,2005.0,7.9,11656.0,0.220366,0.836083,0.884009,0.850461
22621,Kizumonogatari Part 2: Nekketsu,kizumonogatari part nekketsu kizumonogatari pa...,2016.0,8.1,117.0,0.285082,0.999798,0.461795,0.838397
154645,Moulin Rouge!,moulin rouge 2001 moulin rouge celebration lov...,2001.0,7.6,3660.0,0.18775,0.877592,0.74525,0.837889
30970,The Godfather: Part II,godfather part godfather part continuing saga ...,1974.0,8.6,9035.0,0.265044,0.791682,0.936163,0.835026
68741,10 Things I Hate About You,thing hate thing hate first day new school cam...,1999.0,7.6,6128.0,0.272571,0.853248,0.792051,0.834889
52055,Terminator 2: Judgment Day,terminator judgment day terminator judgment da...,1991.0,8.1,9815.0,0.249247,0.807602,0.88975,0.832247


In [18]:
print(recos['vote_count'].max())

13524.0


In [19]:
from IPython.display import display, HTML

def show_recommendations(recos, movies):
    """
    Display movie recommendations with poster, title, year, and rating info.
    """
    if recos is None or recos.empty:
        display(HTML("<p style='color:red;'>No recommendations to display.</p>"))
        return

    html = """
    <style>
        .movie-card {
            display: flex;
            align-items: center;
            margin-bottom: 10px;
            background-color: #f8f9fa;
            border-radius: 10px;
            padding: 10px;
            box-shadow: 0 1px 4px rgba(0,0,0,0.1);
        }
        .movie-card img {
            border-radius: 8px;
            width: 80px;
            height: 120px;
            object-fit: cover;
            margin-right: 12px;
        }
        .movie-info {
            font-family: Arial, sans-serif;
        }
        .movie-title {
            font-size: 16px;
            font-weight: bold;
        }
        .movie-meta {
            font-size: 13px;
            color: #555;
        }
    </style>
    <h3>üé¨ Recommended Movies</h3>
    """

    for _, row in recos.iterrows():
        movie = movies[movies['movie_id'] == row['movie_id']].iloc[0] if 'movie_id' in recos.columns else row
        img = movie.get('image_url', '') or "https://via.placeholder.com/100x150?text=No+Image"
        year = movie.get('year', 'N/A')
        rating = movie.get('vote_average', 'N/A')
        votes = movie.get('vote_count', 'N/A')

        html += f"""
        <div class="movie-card">
            <img src="{img}">
            <div class="movie-info">
                <div class="movie-title">{row['title']} ({year})</div>
                <div class="movie-meta">
                    ‚≠ê Rating: {rating} ({votes} votes)<br>
                    üß† Content: {row['content_score']:.3f} |
                    üë• Collab: {row['collab_score']:.3f} |
                    üî• Final: {row['final_score']:.3f}
                </div>
            </div>
        </div>
        """

    display(HTML(html))



In [None]:
show_recommendations(recos, movies)

In [None]:
from IPython.display import HTML, display

test_users = ['filipe_furtado', 'abluevelvets', 'riverjphoenix', 'jay']

for user in test_users:
    print(f"\nüîπ Recommendations for {user}:")
    recos = recommend_for_user_topk(
        username=user,
        merged_df=merged,
        movies_df=movies,
        topk_indices=topk_indices,
        topk_scores=topk_scores,
        collab_model=collab_model,
        top_n=5,
        weight_collab=0.7
    )
    
    if recos is None or recos.empty:
        print("‚ö†Ô∏è No recommendations found (user not in dataset or no ratings).")
        continue

    # Pretty display
    top_recos = recos.sort_values('final_score', ascending=False).head(5)
    html = "<ul>"
    for _, row in top_recos.iterrows():
        html += f"<li><b>{row['title']}</b> ‚Äî Final Score: {row['final_score']:.3f}</li>"
    html += "</ul>"
    display(HTML(html))


In [None]:
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import pickle
from concurrent.futures import ThreadPoolExecutor, as_completed
import time

class FastLetterboxdRecommender:
    def __init__(self, topk_indices_path, topk_scores_path, svd_model_path, movies_df_path):
        """Load your pre-trained models"""
        print("Loading models...")
        self.topk_indices = topk_indices_path
        self.topk_scores = topk_scores_path
        self.collab_model = svd_model_path
        self.movies_df = movies_df_path
        self.movie_index = {m: i for i, m in enumerate(self.movies_df['movie_id'].tolist())}
        
        self.base_url = "https://letterboxd.com"
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        }
        print("‚úì Models loaded successfully\n")
    
    def get_user_sample_films(self, username, max_films=50, strategy='smart'):
        """
        Fast sampling strategies:
        - 'smart': Get recent + highly rated films only
        - 'random': Random sample across all pages
        - 'first_pages': Just first few pages
        """
        print(f"üé¨ Fetching sample films for user: {username}")
        print(f"Strategy: {strategy}, Max films: {max_films}\n")
        
        if strategy == 'smart':
            return self._get_smart_sample(username, max_films)
        elif strategy == 'random':
            return self._get_random_sample(username, max_films)
        else:
            return self._get_first_pages(username, max_films)
    
    def _get_smart_sample(self, username, max_films):
        """
        Smart sampling: Get highest rated + most recent films
        This gives best representation of user taste
        """
        films = []
        
        # 1. Get highly rated films (5 stars, 4.5 stars)
        print("  ‚Üí Fetching highly rated films...")
        for rating in ['rated/5', 'rated/4.5']:
            url = f"{self.base_url}/{username}/films/{rating}/"
            page_films = self._scrape_page_fast(url, limit=max_films // 3)
            films.extend(page_films)
            if len(films) >= max_films:
                break
        
        # 2. Get recent films (diary - recently watched)
        if len(films) < max_films:
            print("  ‚Üí Fetching recent films from diary...")
            url = f"{self.base_url}/{username}/films/diary/"
            recent_films = self._scrape_page_fast(url, limit=max_films - len(films))
            films.extend(recent_films)
        
        # Remove duplicates
        seen = set()
        unique_films = []
        for f in films:
            if f['movie_id'] not in seen:
                seen.add(f['movie_id'])
                unique_films.append(f)
        
        print(f"  ‚úì Collected {len(unique_films)} unique films\n")
        return unique_films[:max_films]
    
    def _get_first_pages(self, username, max_films):
        """Just scrape first few pages - fastest but less accurate"""
        print("  ‚Üí Fetching first pages...")
        url = f"{self.base_url}/{username}/films/"
        films = self._scrape_page_fast(url, limit=max_films)
        print(f"  ‚úì Collected {len(films)} films\n")
        return films
    
    def _scrape_page_fast(self, url, limit=50):
        """Fast scraping - only get movie IDs and ratings, skip detailed info"""
        films = []
        page = 1
        
        while len(films) < limit:
            page_url = f"{url}page/{page}/" if page > 1 else url
            
            try:
                response = requests.get(page_url, headers=self.headers, timeout=5)
                soup = BeautifulSoup(response.content, 'html.parser')
                
                movie_items = soup.find_all('div', class_='react-component', 
                                           attrs={'data-target-link': True})
                
                if not movie_items:
                    break
                
                for item in movie_items:
                    if len(films) >= limit:
                        break
                    
                    target_link = item.get('data-target-link', '')
                    if '/film/' not in target_link:
                        continue
                    
                    movie_slug = target_link.strip('/').split('/')[-1]
                    
                    # Extract rating from parent if exists
                    rating = self._extract_rating_fast(item)
                    
                    films.append({
                        'movie_id': movie_slug,
                        'rating': rating
                    })
                
                if len(films) >= limit or len(movie_items) == 0:
                    break
                
                page += 1
                time.sleep(0.3)  # Minimal delay
                
            except Exception as e:
                print(f"    Error on page {page}: {e}")
                break
        
        return films
    
    def _extract_rating_fast(self, item):
        """Quick rating extraction"""
        parent = item.find_parent('li')
        if parent:
            rating_span = parent.find('span', class_=lambda x: x and 'rated-' in str(x))
            if rating_span:
                classes = rating_span.get('class', [])
                for cls in classes:
                    if cls.startswith('rated-'):
                        try:
                            rating_value = int(cls.split('-')[1])
                            return rating_value / 2.0
                        except:
                            pass
        return None
    
    def recommend_for_new_user(self, username, user_films, top_n=10, 
                               weight_collab=0.3, weight_popularity=0.4):
        """
        Generate recommendations for a new Letterboxd user
        
        Since user is not in training data:
        - Reduce collab weight (can't use SVD effectively)
        - Increase content + popularity weights
        """
        print(f"üéØ Generating recommendations for {username}...\n")
        
        # Get movie IDs user has seen
        seen_ids = set([f['movie_id'] for f in user_films])
        
        # Filter candidates (movies not seen)
        candidates = self.movies_df[~self.movies_df['movie_id'].isin(seen_ids)].copy()
        
        if candidates.empty:
            print("No candidates found!")
            return pd.DataFrame()
        
        # Map seen films to indices
        seen_indices = [self.movie_index[mid] for mid in seen_ids 
                       if mid in self.movie_index]
        
        if not seen_indices:
            print("‚ö†Ô∏è  No matching films found in training data")
            print("Returning popular films instead...\n")
            return self._get_popular_recommendations(top_n)
        
        # ===== CONTENT-BASED SCORING =====
        print("  ‚Üí Computing content-based scores...")
        candidate_indices = [self.movie_index[m] for m in candidates['movie_id'] 
                           if m in self.movie_index]
        
        content_scores = []
        for idx in candidate_indices:
            # Get similarity to user's watched films
            similarities = []
            for seen_idx in seen_indices:
                # Find if seen_idx is in topk neighbors of idx
                neighbors = self.topk_indices[idx]
                if seen_idx in neighbors:
                    pos = np.where(neighbors == seen_idx)[0][0]
                    similarities.append(self.topk_scores[idx][pos])
            
            if similarities:
                content_scores.append(np.mean(similarities))
            else:
                content_scores.append(0.0)
        
        content_scores = np.array(content_scores)
        
        # ===== PSEUDO-COLLABORATIVE SCORE =====
        # Use average rating of similar films
        print("  ‚Üí Computing collaborative scores...")
        collab_scores = np.zeros(len(candidates))
        
        user_ratings = [f['rating'] for f in user_films if f['rating'] is not None]
        if user_ratings:
            avg_user_rating = np.mean(user_ratings)
            # Weight by content similarity
            collab_scores = content_scores * avg_user_rating
        
        # ===== POPULARITY SCORE =====
        print("  ‚Üí Computing popularity scores...")
        if 'vote_average' in candidates.columns and 'vote_count' in candidates.columns:
            popularity = (np.log1p(candidates['vote_count'].fillna(0)) * 
                         candidates['vote_average'].fillna(0))
        else:
            popularity = np.zeros(len(candidates))
        
        # ===== NORMALIZATION =====
        def normalize(arr):
            arr = np.array(arr, dtype=float)
            if arr.max() == arr.min():
                return np.zeros_like(arr)
            return (arr - arr.min()) / (arr.max() - arr.min())
        
        c1 = normalize(content_scores)
        c2 = normalize(collab_scores)
        c3 = normalize(popularity)
        
        # ===== FINAL SCORE =====
        w_content = 1 - weight_collab - weight_popularity
        final_score = w_content * c1 + weight_collab * c2 + weight_popularity * c3
        
        # ===== PREPARE RESULTS =====
        candidates = candidates.reset_index(drop=True)
        candidates['content_score'] = c1
        candidates['collab_score'] = c2
        candidates['popularity_score'] = c3
        candidates['final_score'] = final_score
        
        # Select columns for output
        output_cols = ['movie_id', 'movie_name', 'year', 'genres', 'vote_average', 'vote_count',
                      'content_score', 'collab_score', 'final_score']
        
        # Filter to only existing columns
        output_cols = [c for c in output_cols if c in candidates.columns]
        
        results = candidates.nlargest(top_n, 'final_score')[output_cols]
        
        print(f"  ‚úì Generated {len(results)} recommendations\n")
        return results
    
    def _get_popular_recommendations(self, top_n):
        """Fallback: return popular films"""
        if 'vote_average' in self.movies_df.columns and 'vote_count' in self.movies_df.columns:
            popular = self.movies_df.copy()
            popular['popularity'] = (np.log1p(popular['vote_count'].fillna(0)) * 
                                    popular['vote_average'].fillna(0))
            return popular.nlargest(top_n, 'popularity')[
                ['movie_id', 'title', 'year', 'vote_average', 'vote_count']
            ]
        return self.movies_df.head(top_n)
    
    def recommend_realtime(self, username, max_films=50, top_n=10, strategy='smart'):
        """
        Main function: Real-time recommendations for any Letterboxd user
        
        Args:
            username: Letterboxd username
            max_films: Max films to sample (30-50 is optimal for speed)
            top_n: Number of recommendations to return
            strategy: 'smart', 'first_pages', or 'random'
        
        Returns:
            DataFrame with top-N recommendations
        """
        start_time = time.time()
        
        print("="*70)
        print(f"üé¨ LETTERBOXD REAL-TIME RECOMMENDER")
        print("="*70)
        print(f"Username: {username}")
        print(f"Max sample size: {max_films}")
        print(f"Strategy: {strategy}\n")
        
        # Step 1: Fast scraping
        user_films = self.get_user_sample_films(username, max_films, strategy)
        
        if not user_films:
            print("‚ùå No films found for this user")
            return pd.DataFrame()
        
        # Step 2: Generate recommendations
        recommendations = self.recommend_for_new_user(
            username, user_films, top_n=top_n,
            weight_collab=0.2,  # Lower weight since user not in training
            weight_popularity=0.4  # Higher weight for popularity
        )
        
        elapsed = time.time() - start_time
        print("="*70)
        print(f"‚úÖ COMPLETED in {elapsed:.2f} seconds")
        print("="*70)
        
        return recommendations

In [None]:
# ============================================================================
# USAGE EXAMPLE
# ============================================================================

if __name__ == "__main__":
    # Initialize recommender with your saved models
    recommender = FastLetterboxdRecommender(
        topk_indices_path=topk_indices,
        topk_scores_path=topk_scores,
        svd_model_path=collab_model,
        movies_df_path=movies  # Your movies dataset
    )
    
    # Get real-time recommendations for any Letterboxd user
    username = "marwanmovies"  # Change to any username
    
    recommendations = recommender.recommend_realtime(
        username=username,
        max_films=40,  # Sample 40 most representative films
        top_n=10,      # Return top 10 recommendations
        strategy='smart'  # Use smart sampling
    )
    
    print("\nüéØ TOP RECOMMENDATIONS:")
    print("="*70)
    print(recommendations.to_string(index=False))
    
    # Save to CSV
    recommendations.to_csv(f'{username}_recommendations.csv', index=False)
    print(f"\n‚úÖ Saved to {username}_recommendations.csv")

In [31]:
recommender = FastLetterboxdRecommender(
       topk_indices_path=topk_indices,
       topk_scores_path=topk_scores,
       svd_model_path=collab_model,
       movies_df_path=movies  # Your movies dataset
   )

# Get recommendations in ~10 seconds!
recs = recommender.recommend_realtime(
    username="marwanmovies",
    max_films=40,  # Optimal balance
    top_n=10,
    strategy='smart'
)

Loading models...
‚úì Models loaded successfully

üé¨ LETTERBOXD REAL-TIME RECOMMENDER
Username: marwanmovies
Max sample size: 40
Strategy: smart

üé¨ Fetching sample films for user: marwanmovies
Strategy: smart, Max films: 40

  ‚Üí Fetching highly rated films...
  ‚Üí Fetching recent films from diary...
  ‚úì Collected 37 unique films

üéØ Generating recommendations for marwanmovies...

  ‚Üí Computing content-based scores...
  ‚Üí Computing collaborative scores...
  ‚Üí Computing popularity scores...
  ‚úì Generated 10 recommendations

‚úÖ COMPLETED in 167.21 seconds
