Author: David Zelenay

# LLM Movie recommender


The notebook builds an advanced **movie recommender system** that suggests movies to users. It works in a few main stages:

1.  **Loads Data:** It starts by reading two datasets: one with movie information (titles, genres) and another with user ratings for movies.

2.  **Generates Initial Movie Ideas (Hybrid Approach):**
    *   **Content-Based:** It looks at the *content* of movies a user liked (like titles and genres) and finds other movies that are textually similar. It does this by converting movie descriptions into numerical representations (embeddings) and comparing them.
    *   **Collaborative Filtering:** It finds other users who have *similar rating patterns* to the target user. Then, it suggests movies that these "similar users" liked but the target user hasn't seen yet.

3.  **Smartly Refines Recommendations with an AI (RAG using Phi-4):**
    *   This is the core of the "RAG" (Retrieval-Augmented Generation) system.
    *   The script takes the list of potential movies generated above (both content-based and collaborative), along with the target user's liked and disliked movies.
    *   It sends all this information to a Large Language Model (Microsoft's Phi-4, run locally via Ollama).
    *   The LLM is instructed to act like a movie expert. Based on all the provided context, it selects a final set of recommendations *from the candidate list*, provides a reason for each recommendation, and assigns a confidence score.

4.  **Outputs Results:**
    *   The final output is a neatly structured list of recommended movies (in JSON format), including details like movie ID, title, genres, and the AI-generated reason for the recommendation.
    *   The script also includes functions to test the recommender with sample users and save the recommendations to a file.

In essence, it combines traditional recommendation techniques to gather potential movies and then uses a powerful AI to intelligently select, rank, and explain the best suggestions based on the user's taste and the available candidates.

In [1]:
import pandas as pd

In [2]:
movies_df = pd.read_parquet("../data/parquet/movies.parquet")
ratings_df = pd.read_parquet("../data/parquet/ratings.parquet")

# Display the first few rows of each
display(movies_df.head())
display(ratings_df.head())

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,2006-05-17 15:34:04
1,1,306,3.5,2006-05-17 12:26:57
2,1,307,5.0,2006-05-17 12:27:08
3,1,665,5.0,2006-05-17 15:13:40
4,1,899,3.5,2006-05-17 12:21:50


# RAG Movie Recommender System using Phi-4 on Ollama

This notebook implements a hybrid movie recommender system using Retrieval-Augmented Generation (RAG) with Microsoft's Phi-4 model running on Ollama. The system combines:

1. **Content-based filtering**: Recommending movies based on a user's previous ratings
2. **Collaborative filtering**: Recommending movies that similar users liked

We'll use the MovieLens dataset for this implementation and output a structured list of dictionaries with movie information.

In [3]:
# Import necessary libraries
import numpy as np
import json
from scipy.sparse import csr_matrix
from scipy.spatial.distance import cosine
from sentence_transformers import SentenceTransformer
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_ollama import OllamaLLM
import random


## Data Exploration and Preprocessing

Let's explore our MovieLens dataset and prepare it for our recommender system.

In [4]:
# Check the structure of our datasets
print("Movies dataset shape:", movies_df.shape)
print("Ratings dataset shape:", ratings_df.shape)

# Display movie dataset columns
print("\nMovies columns:", movies_df.columns.tolist())

# Display ratings dataset columns
print("\nRatings columns:", ratings_df.columns.tolist())

# Check for missing values
print("\nMissing values in movies dataset:")
print(movies_df.isnull().sum())

print("\nMissing values in ratings dataset:")
print(ratings_df.isnull().sum())

Movies dataset shape: (62423, 3)
Ratings dataset shape: (25000095, 4)

Movies columns: ['movieId', 'title', 'genres']

Ratings columns: ['userId', 'movieId', 'rating', 'timestamp']

Missing values in movies dataset:
movieId    0
title      0
genres     0
dtype: int64

Missing values in ratings dataset:
userId       0
movieId      0
rating       0
timestamp    0
dtype: int64


In [5]:
# Prepare movie data for embedding
# Create a text representation of each movie including title and genres
movies_df['text_representation'] = movies_df['title'] + ' - ' + movies_df['genres'].str.replace('|', ' ')

# Display examples
print("Movie text representations for embedding:")
print(movies_df[['movieId', 'text_representation']].head())

# Create a mapping from movie ID to title for easy lookup
movie_id_to_title = dict(zip(movies_df['movieId'], movies_df['title']))
movie_id_to_text = dict(zip(movies_df['movieId'], movies_df['text_representation']))

Movie text representations for embedding:
   movieId                                text_representation
0        1  Toy Story (1995) - Adventure Animation Childre...
1        2        Jumanji (1995) - Adventure Children Fantasy
2        3           Grumpier Old Men (1995) - Comedy Romance
3        4    Waiting to Exhale (1995) - Comedy Drama Romance
4        5        Father of the Bride Part II (1995) - Comedy


## Content-Based Recommendation using Embeddings

First, we'll create embeddings for each movie based on their title and genres.

In [6]:
# Load a sentence transformer model for creating embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings for movie text representations
movie_texts = movies_df['text_representation'].tolist()
movie_embeddings = model.encode(movie_texts)

# Function to find similar movies based on embeddings
def find_similar_movies(movie_id, top_n=10):
    if movie_id not in movie_id_to_title:
        return []
    
    # Get the index of the movie in our dataframe
    movie_idx = movies_df[movies_df['movieId'] == movie_id].index[0]
    
    # Get the embedding of the movie
    movie_embedding = movie_embeddings[movie_idx]
    
    # Calculate cosine similarity with all movies
    similarities = []
    for idx, embedding in enumerate(movie_embeddings):
        if idx != movie_idx:  # Skip the input movie itself
            similarity = 1 - cosine(movie_embedding, embedding)  # Convert distance to similarity
            movie_id_similar = movies_df.iloc[idx]['movieId']
            similarities.append((movie_id_similar, similarity))
    
    # Sort by similarity (descending) and take top N
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:top_n]

# Test the function with a movie
sample_movie_id = movies_df.iloc[2]['movieId']
print(f"Finding movies similar to: {movie_id_to_title[sample_movie_id]}")

similar_movies = find_similar_movies(sample_movie_id, top_n=5)
for movie_id, similarity in similar_movies:
    print(f"- {movie_id_to_title[movie_id]} (Similarity: {similarity:.4f})")

Finding movies similar to: Grumpier Old Men (1995)
- Grumpy Old Men (1993) (Similarity: 0.8635)
- The Grumpy (1986) (Similarity: 0.7364)
- The Grump (2014) (Similarity: 0.7226)
- Happier times, Grump (2018) (Similarity: 0.7026)
- Good Men, Good Women (1995) (Similarity: 0.6740)


## Collaborative Filtering

Next, we'll implement collaborative filtering to find similar users based on their rating patterns.

In [7]:
# Create a user-item matrix for collaborative filtering
# Convert ratings to a user-item matrix
def create_user_item_matrix():
    # Create a sparse matrix where rows are users and columns are movies
    user_ids = ratings_df['userId'].unique()
    movie_ids = movies_df['movieId'].unique()
    
    # Create mappings for matrix indices
    user_idx_map = {user_id: idx for idx, user_id in enumerate(user_ids)}
    movie_idx_map = {movie_id: idx for idx, movie_id in enumerate(movie_ids)}
    
    # Create the sparse matrix
    rows = [user_idx_map[user_id] for user_id in ratings_df['userId']]
    cols = [movie_idx_map[movie_id] for movie_id in ratings_df['movieId']]
    data = ratings_df['rating'].values
    
    matrix = csr_matrix((data, (rows, cols)), shape=(len(user_ids), len(movie_ids)))
    
    return matrix, user_ids, movie_ids, user_idx_map, movie_idx_map

# Function to find similar users based on rating patterns
def find_similar_users(user_id, matrix, user_ids, user_idx_map, top_n=10):
    if user_id not in user_idx_map:
        return []
    
    user_idx = user_idx_map[user_id]
    user_ratings = matrix[user_idx].toarray().flatten()
    
    # Calculate similarity with all users
    similarities = []
    for idx in range(len(user_ids)):
        if idx != user_idx and matrix[idx].nnz > 0:  # Skip the user itself and users with no ratings
            other_ratings = matrix[idx].toarray().flatten()
            
            # Find movies both users have rated
            common_mask = np.logical_and(user_ratings > 0, other_ratings > 0)
            
            # Only proceed if they have rated at least 5 common movies
            if np.sum(common_mask) >= 5:
                # Calculate cosine similarity on common ratings
                similarity = 1 - cosine(user_ratings[common_mask], other_ratings[common_mask])
                similarities.append((user_ids[idx], similarity))
    
    # Sort by similarity (descending) and take top N
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:top_n]

# Create the user-item matrix
user_item_matrix, unique_user_ids, unique_movie_ids, user_idx_map, movie_idx_map = create_user_item_matrix()

# Test the function with a user
sample_user_id = ratings_df['userId'].iloc[0]
print(f"Finding users similar to user ID: {sample_user_id}")

similar_users = find_similar_users(sample_user_id, user_item_matrix, unique_user_ids, user_idx_map, top_n=5)
for user_id, similarity in similar_users:
    print(f"- User ID: {user_id} (Similarity: {similarity:.4f})")

Finding users similar to user ID: 1
- User ID: 7266 (Similarity: 1.0000)
- User ID: 11414 (Similarity: 1.0000)
- User ID: 18722 (Similarity: 1.0000)
- User ID: 32501 (Similarity: 1.0000)
- User ID: 35146 (Similarity: 1.0000)


## Setting up Phi-4 with Ollama

Now, let's set up Microsoft's Phi-4 model hosted on Ollama to create our RAG system.

In [8]:
# Initialize Microsoft Phi-4 model
try:
    # If you have Azure API access, use the following
    # You'll need to set your Azure OpenAI API key and endpoint in environment variables
    # os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key"
    # os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint"
    
    # Azure OpenAI configuration for Phi-4
    # llm = AzureOpenAI(
    #     deployment_name="phi-4",  # The deployment name you chose when you deployed the model
    #     model_name="phi-4",
    #     temperature=0.7
    # )
    
    # Initialize Phi-4 model using Ollama
    llm = OllamaLLM(model="phi4")
    print("Successfully initialized Phi-4 model using Ollama")
    
    # Test the LLM
    response = llm.invoke("What are some popular movie genres?")
    print("\nTest LLM response:")
    print(response)
    
except Exception as e:
    print(f"Error initializing Phi-4 with Ollama: {e}")
    print("Please make sure Ollama is running and the Phi-4 model is pulled.")
    print("You can pull the model with: ollama pull phi4")

Successfully initialized Phi-4 model using Ollama

Test LLM response:
Some of the most popular movie genres include:

1. **Action** - Known for high energy, fast-paced sequences, and often featuring physical stunts, fights, or chases.

2. **Comedy** - Focuses on humor to entertain and amuse audiences, often involving funny situations, characters, or dialogues.

3. **Drama** - Explores serious themes and character development, dealing with realistic emotional experiences and relationships.

4. **Horror** - Designed to frighten and unsettle viewers through supernatural elements, suspenseful atmospheres, and frightening imagery.

5. **Romance** - Centers on love stories and romantic relationships between characters, often focusing on emotional connections and personal growth.

6. **Science Fiction (Sci-Fi)** - Involves futuristic or speculative concepts such as advanced technology, space exploration, time travel, and extraterrestrial life.

7. **Fantasy** - Features magical elements, myth

In [9]:
# Function to get a user's rated movies
def get_user_rated_movies(user_id, min_rating=3.5):
    # Get all ratings by this user
    user_ratings = ratings_df[ratings_df['userId'] == user_id]
    
    # Filter for liked movies (rating >= min_rating)
    liked_movies = user_ratings[user_ratings['rating'] >= min_rating]
    
    # Filter for disliked movies (rating < min_rating)
    disliked_movies = user_ratings[user_ratings['rating'] < min_rating]
    
    # Get the movie details
    liked_movie_details = [(movie_id, 
                           movie_id_to_title.get(movie_id, 'Unknown'), 
                           rating) 
                          for movie_id, rating in zip(liked_movies['movieId'], liked_movies['rating'])]
    
    disliked_movie_details = [(movie_id, 
                              movie_id_to_title.get(movie_id, 'Unknown'), 
                              rating) 
                             for movie_id, rating in zip(disliked_movies['movieId'], disliked_movies['rating'])]
    
    return liked_movie_details, disliked_movie_details

## RAG Movie Recommender System

Now let's combine everything to create our hybrid RAG movie recommender system.

In [10]:
# Create a RAG-based movie recommender that combines user history and similar users
def get_rag_recommendations(user_id, num_recommendations=5):
    # Get the user's rating history
    liked_movies, disliked_movies = get_user_rated_movies(user_id)
    
    # If user has no ratings, return popular movies
    if not liked_movies and not disliked_movies:
        print(f"User {user_id} has no ratings. Recommending popular movies instead.")
        # Get popular movies (high average rating and many ratings)
        movie_stats = ratings_df.groupby('movieId').agg({
            'rating': ['mean', 'count']
        })
        movie_stats.columns = ['avg_rating', 'num_ratings']
        popular_movies = movie_stats[movie_stats['num_ratings'] > 50].sort_values('avg_rating', ascending=False)
        popular_movie_ids = popular_movies.head(num_recommendations).index.tolist()
        
        # Create a list of dictionaries for popular movies
        result = []
        for movie_id in popular_movie_ids:
            movie_title = movie_id_to_title.get(movie_id, 'Unknown')
            genres = movies_df[movies_df['movieId'] == movie_id]['genres'].values[0].replace('|', ', ')
            result.append({
                'movie_id': int(movie_id),
                'title': movie_title,
                'genres': genres,
                'rating': float(popular_movies.loc[movie_id, 'avg_rating']),
                'recommendation_reason': 'Popular movie with high average rating',
                'recommendation_source': 'popularity'
            })
        return result
    
    # Find similar users for collaborative filtering
    similar_users = find_similar_users(user_id, user_item_matrix, unique_user_ids, user_idx_map, top_n=10)
    
    # Get content-based recommendations from liked movies
    content_based_recs = []
    for movie_id, _, _ in liked_movies[:5]:  # Use up to 5 liked movies
        similar = find_similar_movies(movie_id, top_n=3)
        content_based_recs.extend(similar)
    
    # Get collaborative recommendations from similar users
    collab_recs = []
    for similar_user_id, _ in similar_users:
        # Get movies that similar user liked but our user hasn't rated
        similar_user_liked, _ = get_user_rated_movies(similar_user_id)
        
        # Extract just the movie IDs the user has already rated
        user_rated_movie_ids = [movie_id for movie_id, _, _ in liked_movies + disliked_movies]
        
        # Add movies the similar user liked but our user hasn't rated
        for movie_id, title, rating in similar_user_liked:
            if movie_id not in user_rated_movie_ids:
                collab_recs.append((movie_id, title, rating))
    
    # Combine recommendations and remove duplicates
    all_recs = {}
    
    # Add content-based recommendations
    for movie_id, similarity in content_based_recs:
        if movie_id not in all_recs and movie_id not in [m[0] for m in liked_movies + disliked_movies]:
            all_recs[movie_id] = (movie_id_to_title.get(movie_id, 'Unknown'), similarity, 'content')
    
    # Add collaborative recommendations
    for movie_id, title, rating in collab_recs:
        if movie_id not in all_recs and movie_id not in [m[0] for m in liked_movies + disliked_movies]:
            all_recs[movie_id] = (title, rating, 'collaborative')
    
    # Prepare data for Phi-4 (via Ollama)
    # Create a context with the user's preferences and candidate recommendations
    liked_movies_text = "\n".join([f"- {title} (Rated: {rating})" 
                              for _, title, rating in liked_movies[:10]])
    
    disliked_movies_text = "\n".join([f"- {title} (Rated: {rating})" 
                                 for _, title, rating in disliked_movies[:5]])
    
    # Create a list of candidate movies with more details
    candidate_movies = []
    for movie_id, (title, _, source) in all_recs.items():
        # Get genres for this movie
        genres = movies_df[movies_df['movieId'] == movie_id]['genres'].values[0].replace('|', ', ')
        candidate_movies.append({
            'movie_id': int(movie_id),
            'title': title,
            'genres': genres,
            'recommendation_source': source
        })
    
    # Format candidate movies as text for the prompt
    candidate_movies_text = "\n".join([f"- {m['title']} (Genres: {m['genres']}) (Source: {m['recommendation_source']})" 
                                  for m in candidate_movies])
    
    # Create a prompt for the LLM with structured output instructions
    prompt_template = """
    You are a movie recommendation expert. Based on a user's movie preferences, 
    recommend {num_recommendations} movies they might enjoy.
    
    Movies the user liked:
    {liked_movies}
    
    Movies the user disliked:
    {disliked_movies}
    
    Candidate movies to recommend:
    {candidate_movies}
    
    Analyze the user's taste based on genres, themes, and directors they prefer. 
    Consider both content similarity and recommendations from similar users.
    
    ONLY recommend movies from the candidate list above.
    
    Return your recommendations as a JSON list of dictionaries, with each dictionary having these keys:
    - "movie_id": (the movie ID as an integer)
    - "title": (the movie title as a string)
    - "genres": (the movie genres as a string)
    - "recommendation_reason": (a brief reason why you're recommending this movie)
    - "recommendation_source": (either "content" or "collaborative")
    - "recommendation_score": (a score from 0.0 to 1.0 indicating how strongly you recommend this movie)
    
    Here's an example of the expected output format:
    ```json
    [
      {{
        "movie_id": 123,
        "title": "Movie Title 1",
        "genres": "Action, Adventure",
        "recommendation_reason": "Similar to user's preferred action movies",
        "recommendation_source": "content",
        "recommendation_score": 0.95
      }},
      ...
    ]
    ```
    
    Generate exactly {num_recommendations} recommendations in your response.
    IMPORTANT: Make sure your response is a valid JSON list that can be parsed with json.loads().
    """
    
    # Use PromptTemplate
    prompt = PromptTemplate(
        input_variables=["num_recommendations", "liked_movies", "disliked_movies", "candidate_movies"],
        template=prompt_template
    )
    
    # Create the chain
    chain = LLMChain(llm=llm, prompt=prompt)
    
    # Run the chain
    try:
        response = chain.invoke({
            "num_recommendations": num_recommendations,
            "liked_movies": liked_movies_text if liked_movies else "None",
            "disliked_movies": disliked_movies_text if disliked_movies else "None",
            "candidate_movies": candidate_movies_text if all_recs else "None"
        })
        
        # Extract the JSON part of the response
        response_text = response['text']
        
        # Find JSON in the response - it might be wrapped in code blocks
        json_pattern = response_text
        if '```json' in response_text:
            json_pattern = response_text.split('```json')[1].split('```')[0].strip()
        elif '```' in response_text:
            json_pattern = response_text.split('```')[1].split('```')[0].strip()
        
        # Parse the JSON
        try:
            recommendations_list = json.loads(json_pattern)
            
            # Ensure all required fields are present
            for rec in recommendations_list:
                if 'movie_id' not in rec or 'title' not in rec:
                    raise ValueError("Missing required fields in recommendation")
            
            return recommendations_list
        except json.JSONDecodeError as je:
            print(f"Error parsing JSON: {je}")
            print("Raw response: {response_text}")
            raise
            
    except Exception as e:
        print(f"Error generating recommendations: {e}")
        # Fallback: create structured output without LLM processing
        sorted_recs = sorted(all_recs.items(), key=lambda x: random.random(), reverse=True)[:num_recommendations]
        
        result = []
        for movie_id, (title, score, source) in sorted_recs:
            # Get genres for this movie
            genres = movies_df[movies_df['movieId'] == movie_id]['genres'].values[0].replace('|', ', ')
            
            result.append({
                'movie_id': int(movie_id),
                'title': title,
                'genres': genres,
                'recommendation_reason': f"Recommended based on {source} similarity",
                'recommendation_source': source,
                'recommendation_score': float(score) if isinstance(score, (int, float)) else 0.8
            })
            
        return result

In [12]:
# Test the recommender system with a sample user
def test_recommender(user_id=None):
    if user_id is None:
        # Pick a random user with at least 10 ratings
        user_rating_counts = ratings_df['userId'].value_counts()
        active_users = user_rating_counts[user_rating_counts >= 10].index.tolist()
        user_id = random.choice(active_users)
    
    print(f"\nGetting recommendations for user {user_id}...\n")
    
    # Get the user's liked and disliked movies
    liked_movies, disliked_movies = get_user_rated_movies(user_id)
    
    print(f"User {user_id} has rated {len(liked_movies) + len(disliked_movies)} movies")
    print(f"Liked {len(liked_movies)} movies, including:")
    for _, title, rating in liked_movies[:5]:  # Show first 5
        print(f"- {title} (Rated: {rating})")
    
    print(f"\nDisliked {len(disliked_movies)} movies, including:")
    for _, title, rating in disliked_movies[:5]:  # Show first 5
        print(f"- {title} (Rated: {rating})")
    
    print("\nGenerating recommendations...\n")
    recommendations = get_rag_recommendations(user_id, num_recommendations=5)
    
    print("Recommended Movies (Structured Output):")
    for i, rec in enumerate(recommendations, 1):
        print(f"{i}. {rec['title']}")
        print(f"   Movie ID: {rec['movie_id']}")
        print(f"   Genres: {rec['genres']}")
        print(f"   Reason: {rec['recommendation_reason']}")
        print(f"   Source: {rec['recommendation_source']}")
        if 'recommendation_score' in rec:
            print(f"   Score: {rec['recommendation_score']:.2f}")
        print()
    
    # Also print the raw JSON for verification
    print("Raw JSON output:")
    return recommendations

# Run the test
res = test_recommender()


Getting recommendations for user 55381...

User 55381 has rated 23 movies
Liked 20 movies, including:
- Quick and the Dead, The (1995) (Rated: 4.0)
- Silence of the Lambs, The (1991) (Rated: 4.0)
- Monty Python and the Holy Grail (1975) (Rated: 4.5)
- One Flew Over the Cuckoo's Nest (1975) (Rated: 3.5)
- 12 Angry Men (1957) (Rated: 4.0)

Disliked 3 movies, including:
- Indian in the Cupboard, The (1995) (Rated: 3.0)
- Say Anything... (1989) (Rated: 3.0)
- Fast Times at Ridgemont High (1982) (Rated: 3.0)

Generating recommendations...

Recommended Movies (Structured Output):
1. Ghost Rider
   Movie ID: 1
   Genres: Action, Fantasy, Thriller
   Reason: Similar to user's preferred action and fantasy movies.
   Source: content
   Score: 0.92

2. The Sandlot
   Movie ID: 2
   Genres: Children, Comedy, Drama
   Reason: Appealing to users who enjoy heartfelt adventures and comedies.
   Source: collaborative
   Score: 0.88

3. Big Trouble in Little China
   Movie ID: 3
   Genres: Action, Adve

In [19]:
movies_df.query("movieId == 58293")  # Example query to check a specific movie
# Save the recommendations to a JSON file for further analysis

Unnamed: 0,movieId,title,genres,text_representation
12182,58293,"10,000 BC (2008)",Adventure|Romance|Thriller,"10,000 BC (2008) - Adventure Romance Thriller"


In [20]:
res

[{'movie_id': 1,
  'title': 'Avatar',
  'genres': 'Action, Adventure, Sci-Fi, IMAX',
  'recommendation_reason': 'Combines action and adventure with stunning sci-fi elements.',
  'recommendation_source': 'content',
  'recommendation_score': 0.98},
 {'movie_id': 2,
  'title': 'Thor',
  'genres': 'Action, Adventure, Drama, Fantasy, IMAX',
  'recommendation_reason': 'Mixes action and adventure with a strong fantasy element.',
  'recommendation_source': 'content',
  'recommendation_score': 0.95},
 {'movie_id': 3,
  'title': 'How to Train Your Dragon',
  'genres': 'Adventure, Animation, Children, Fantasy, IMAX',
  'recommendation_reason': 'Blends adventure and fantasy with engaging animation.',
  'recommendation_source': 'content',
  'recommendation_score': 0.92},
 {'movie_id': 4,
  'title': 'Avengers: Age of Ultron',
  'genres': 'Action, Adventure, Sci-Fi',
  'recommendation_reason': 'Offers a thrilling blend of action and sci-fi.',
  'recommendation_source': 'content',
  'recommendation_sc

In [14]:
# Save recommendations to a file
def save_recommendations_to_file(recommendations, user_id, filename=None):
    if filename is None:
        filename = f"user_{user_id}_recommendations.json"
    
    with open(filename, 'w') as f:
        json.dump(recommendations, f, indent=2)
    
    print(f"Recommendations saved to {filename}")

# Example usage
# Get recommendations for a specific user
user_id = 1  # Replace with desired user ID
recommendations = get_rag_recommendations(user_id, num_recommendations=10)

# Save to file
save_recommendations_to_file(recommendations, user_id)

Error generating recommendations: Missing some input keys: {'\n        "movie_id"'}
Recommendations saved to user_1_recommendations.json
