# Integrating External Recommenders with RAG for Context-Aware Conversational Movie Recommendation (Baseline)

In this project, we developed a conversational movie recommender that integrates Retrieval-Augmented Generation (RAG) architecture with specialized external recommenders trained on movie datasets. Implemented as the final project in Recommender Systems course in Toronto Metropolitan University.

By combining the reasoning capabilities of LLMs with domain-specific recommenders, we created a conversational recommender that considers context to provide accurate recommendations. We implemented and compared the performance of the system using three external domain recommenders: 1) a BERT-based Transformer with a trainable recommender head, 2) a Relational Graph Convolutional Neural Network (RGCN), and 3) Neural Collaborative Filtering (NCF), all trained on the INSPIRED movie dataset.

We implemented the following models:

* LLM + RAG (Baseline)
* [LLM + RAG + RGCN](./RGCN.ipynb)
* LLM + RAG + NCF
* LLM + RAG + Transformer

In this notebook, the *LLM + RAG (Baseline)* model is implemented and used to connect the external recommenders to the baseline model. 

## Environment Initialization

In [2]:
'''
Function:
    - Check Current Working Directory
    - Move to Correct Directory
'''
import os

os.chdir("..")
print("Current Working Directory:", os.getcwd())

Current Working Directory: C:\Users\91953\Documents\GitHub\RAG-Movie-CRS


### Requirements

In [4]:
!pip install -r requirements.txt



### RAG, LLM, Global Settings

In [5]:
from pathlib import Path
import csv
import subprocess
from llama_index.core import VectorStoreIndex, Document, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
from tqdm import tqdm

In [6]:
# Initialize the embedding model
embed_model = OllamaEmbedding(
    model_name="nomic-embed-text",
    request_timeout=300.0,
)

# Initialize the LLM with optimized settings
llm = Ollama(
    model="llama3.2:1B",
    request_timeout=300.0,
    temperature=0.1,  # Keep the model predictable
    additional_kwargs={"num_gpu": 0}  # Forcing CPU usage
)

# Set global configurations
Settings.embed_model = embed_model
Settings.llm = llm

### Load Dataset

In [5]:
'''
Functions:
    - Load INSPIRED dataset from TSV 
    - Convert to documents

Args:
    - data_path: Path to the TSV file
    - max_rows: Maximum number of rows to load (None = load all rows)
'''

def load_inspired_dataset(data_path, max_rows=None):
    
    if not Path(data_path).exists():
        raise FileNotFoundError(f"INSPIRED dataset not found at '{data_path}'")
    
    documents = []
    
    # First pass: count total rows for progress bar
    with open(data_path, 'r', encoding='utf-8') as f:
        total_rows = sum(1 for _ in f) - 1  # Subtract header row

    # If not using all rows, select min of total and max
    # kept for debugging
    if max_rows is not None:
        total_rows = min(total_rows, max_rows)
    
    # 2nd pass: Load the TSV data
    with open(data_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f, delimiter='\t')
        
        for idx, row in enumerate(tqdm(reader, total=total_rows, desc="Loading data")):
            # Stop if reached max_rows
            if max_rows is not None and idx >= max_rows:
                break
                
            # Extract conversation information from TSV
            dialog_id = row.get('dialog_id', '')
            turn_id = row.get('turn_id', '')
            utterance = row.get('utterance', '')
            speaker = row.get('speaker', '')
            movie_name = row.get('movie_name', '')
            
            # Create document text
            doc_text = f"Speaker: {speaker}\nUtterance: {utterance}\n"
            
            if movie_name:
                doc_text += f"Movie mentioned: {movie_name}\n"
            
            # Create metadata
            metadata = {
                "dialog_id": dialog_id,
                "turn_id": turn_id,
                "speaker": speaker,
                "movie_name": movie_name if movie_name else "None"
            }
            
            # Create Document object
            doc = Document(text=doc_text, metadata=metadata)
            documents.append(doc)
    
    print(f"Loaded {len(documents)} out of {total_rows} conversational turns from INSPIRED dataset")
    return documents

In [6]:
'''
Functions:
    - Load the unprocessed INSPIRED movie database for reference
Args:
    - dataset_dir: Directory of the TSV file
'''

def load_movie_database(dataset_dir="data"):
    
    movie_db_path = Path(dataset_dir) / "raw" / "movie_database.tsv"
    
    if not movie_db_path.exists():
        print(f"Movie database not found at {movie_db_path}")
        return {}
    
    movies = {}
    with open(movie_db_path, 'r', encoding='utf-8') as f:
        reader = csv.DictReader(f, delimiter='\t')
        for row in reader:
            movie_id = row.get('movieId', '')
            movies[movie_id] = row
    
    print(f"Loaded {len(movies)} movies from FULL movie database")
    return movies

## Define Functions

### Create Vector Index

In [7]:
'''
Function:
    - Load INSPIRED dataset
    - Create vector index

Args:
    - dataset_dir: Directory containing the dataset
    - split: Which split to use (train, dev, test)
    - max_rows: Maximum number of rows to load (None for all rows)
    - load_movie_db: Whether to load movie database (needed for external recommenders)

Returns:
    - index: VectorStoreIndex for RAG
    - movie_db: Movie database (if load_movie_db=True), else None
'''

def load_and_index_documents(dataset_dir="data", split="train", max_rows=None, load_movie_db=True):
        
    # Construct path to the data file 
    data_path = Path(dataset_dir) / "processed" / f"{split}.tsv"
    
    if not data_path.exists():
        raise FileNotFoundError(
            f"Data file not found at {data_path}.\nCheck for typos in file path."
        )
    
    # Load INSPIRED documents with max_rows limit
    docs = load_inspired_dataset(data_path, max_rows=max_rows)
    
    if not docs:
        raise ValueError("No documents loaded from INSPIRED dataset")
    
    # load movie database for reference
    # Used by external recommenders
    movie_db = load_movie_database(dataset_dir)
    
    # Load documents, then
    # Build vector index from documents
    # Using the Embed model
    print("Building vector index...")
    index = VectorStoreIndex.from_documents(
        docs, 
        embed_model=embed_model,
        show_progress=True
    )
    
    return index, movie_db

### Query Engine

In [8]:
'''
Function:
    - Create query engine with specified retrieval parameters
'''

def create_query_engine(index, similarity_top_k=5):
    
    query_engine = index.as_query_engine(
        llm=llm,
        similarity_top_k=similarity_top_k,
        response_mode="compact", # Reduce token usage
        streaming=True
    )
    
    return query_engine

### Adapt External Recommender

### Interactive Conversation

In [12]:
'''
Interactive conversation with history tracking
'''
def interactive_conversation_with_history():
    
    try:
        # Load documents and create index
        print("\nLoading INSPIRED dataset...")
        index, _ = load_and_index_documents(split="train", max_rows=20)
        
        # Create chat engine instead of query engine
        # query_engine: Stateless - each query is independent
        # chat_engine: Stateful - remembers conversation history 
        #              better for multi-turn conversations
        chat_engine = index.as_chat_engine(
            llm=llm,
            similarity_top_k=5,
            chat_mode="context"  # Use context mode for history
        )
        
        
        print("|| MovieCRS is Ready ||")
        
        print("\nYou can now ask for movie recommendations.")
        print("Type 'quit', 'exit', or 'q' to end the conversation.\n")
        
        while True:
            user_input = input("You: ").strip()
            
            if user_input.lower() in ['quit', 'exit', 'q', 'bye']:
                print("\nMovieCRS: Exiting...")
                break
            
            if not user_input:
                continue
            
            try:
                # Use chat() instead of query() - it handles history automatically
                print("\nMovieCRS: ", end="", flush=True)
                response = chat_engine.chat(user_input)
                
                # Print only the response text
                print(f"{response.response}\n")
                
            except Exception as e:
                print(f"Error: {str(e)}\n")
        
        return True
    
    except Exception as e:
        print(f"System Error: {str(e)}")
        return False

## Main

In [None]:
# Main execution
if __name__ == "__main__":
    
    print("Starting RAG Pipeline with INSPIRED Dataset...")
    
    # Start interactive conversation with history
    success = interactive_conversation_with_history()
    
    if not success:
        print("\nSystem failed to start.")

Starting RAG Pipeline with INSPIRED Dataset...

Loading INSPIRED dataset...


Loading data: 100%|██████████| 20/20 [00:00<00:00, 19930.17it/s]

Loaded 20 turns from INSPIRED dataset





Loaded 1 movies from movie database
Building vector index...


Parsing nodes:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

|| MovieCRS is Ready ||

You can now ask for movie recommendations.
Type 'quit', 'exit', or 'q' to end the conversation.



You:  do not ask any questions, only recommend. I like horror but I want to watch romance now.



MovieCRS: I'd be happy to help you find a new movie that combines horror and romance.

Considering your interest in horror movies, here are some recommendations that might also appeal to you:

1. **The Babadook** (2014) - A psychological horror-thriller about a mother and son who are haunted by a supernatural entity.
2. **It Follows** (2014) - A supernatural horror film about a young woman who is pursued by a relentless, shape-shifting entity.
3. **A Dark Song** (2016) - A supernatural horror movie about a grieving mother who rents a remote house in order to perform a ritual to contact her deceased son.

If you're looking for something a bit more lighthearted and romantic, here are some recommendations:

1. **The Love Witch** (2016) - A campy, retro-inspired horror-comedy that's also a romance.
2. **Starry Eyes** (2014) - A body horror film with a strong romantic subplot.
3. **Raw** (2016) - A French-Belgian horror film about a young vegetarian who becomes a carnivore after a near-dea

## Evaluation

### Standard Metrics

### Contextual Metrics

## Conclusion