# RAG Assistant: Netflix Movies Q&A

A production-ready **Retrieval-Augmented Generation (RAG) system** that answers questions about Netflix movies using semantic search and LLMs.

## ‚ú® Key Features
- **Semantic Search**: 7,655 chunked documents indexed in Chroma vector database
- **RAG Pipeline**: Context-aware answers grounded in actual movie data
- **REST API**: Flask endpoints (`/ask`, `/health`) for integration
- **Clean Architecture**: Separated concerns (notebook for data prep, Python files for production)
- **Lightweight**: CPU-optimized embeddings & local LLM inference (no API costs)

## üìä Data & Stack
- **Data**: 6,020 Netflix movies with plots, genres, directors
- **Embeddings**: HuggingFace `all-MiniLM-L6-v2` (384-dimensional)
- **Vector DB**: Chroma (persistent, local)
- **LLM**: TinyLlama-1.1B-Chat-v1.0 (free, runs locally)
- **Framework**: LangChain (LCEL pipe syntax)
- **API**: Flask on port 5500

## üìÅ Project Structure
```
‚îú‚îÄ‚îÄ rag_assistant.ipynb      # Data pipeline (Sections 1-9)
‚îú‚îÄ‚îÄ app.py                   # Flask API server
‚îú‚îÄ‚îÄ rag_pipeline.py          # Reusable RAG initialization
‚îú‚îÄ‚îÄ data/
‚îÇ   ‚îú‚îÄ‚îÄ documents.csv        # Input: 6,020 Netflix movies
‚îÇ   ‚îî‚îÄ‚îÄ vectorstore/         # Output: Chroma persistence
‚îî‚îÄ‚îÄ README.md                # Project documentation
```

## üöÄ Quick Start
```bash
# Terminal 1: Run the notebook (Sections 1-8) to build vector store
jupyter notebook rag_assistant.ipynb

# Terminal 2: Start the API
source .venv/bin/activate
python app.py

# Terminal 3: Test the API
curl -X POST http://localhost:5500/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What are some comedy movies?"}'
```

## Section 1: Set Up Your Environment & Import Libraries

Install dependencies and configure your development environment.

In [122]:
# 1a: Import core libraries
import pandas as pd
from pathlib import Path

# 1b: Import LangChain components (updated for current version)
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from langchain_community.llms import HuggingFacePipeline

print("‚úì All imports successful - ready to load data!")

‚úì All imports successful - ready to load data!


## Section 2: Load and Prepare Your Documents

Load CSV into memory, normalize text, extract metadata.

In [None]:
# 2a: Load CSV 
df = pd.read_csv('/Users/kipronno/rag-assistant/data/documents.csv')

# 2b: Inspect the dataframe
print(df.shape)
print(df.columns)
# print(df.iloc[0])

# 2c: Print content length statistics
# Documents vary in size. Min/max help you choose good chunk_size later (avoid too small/large).
print(f"Min: {df['content'].str.len().min()}")
print(f"Max: {df['content'].str.len().max()}")

# 2d: Create normalize_text() function
def normalize_text(text):
     if pd.isna(text):
         return ""
     return " ".join(text.split()).replace('\n', ' ').strip()

# 2e: Apply normalization
df['content_ready'] = df['content'].apply(normalize_text)
# print("‚úì Documents loaded and normalized")


(6020, 5)
Index(['id', 'source', 'title', 'content', 'category'], dtype='str')
Min: 199
Max: 1277


In [124]:
print(normalize_text(df.iloc[0]['content']))

An assassin is shot by her ruthless employer, Bill, and other members of their assassination circle ‚Äì but she lives to plot her vengeance. Genre: Action, Crime, Thriller Type: movie Release Year: 2003 Director: Quentin Tarantino Actors: Uma Thurman, Lucy Liu, Vivica A. Fox, Daryl Hannah, David Carradine Uma Thurman Lucy Liu Vivica A Fox Daryl Hannah David Carradine IMDB Rating: 8.2 Available in 67 countries


## Section 3: Split Documents into Chunks

Split long documents into manageable chunks with overlap for context preservation.

In [125]:
# 3a: Initialize RecursiveCharacterTextSplitter with chunk_size=500, overlap=50
#  LLMs struggle with long docs (1000+ words). 500-word chunks balance context + focus.
# Prevents losing meaning at boundaries. Overlap carries context forward seamlessly.
splitter = RecursiveCharacterTextSplitter(
     separators=["\n\n", "\n", ". ", " "],
     chunk_size=500,
     chunk_overlap=50
)

# 3b: Split all documents into chunks
#  Creates ~2000 small docs to embed. More chunks = finer-grained semantic search (better precision).
# Why metadata fields: Thread info through pipeline so you can tell users WHERE each answer came from.
chunks = []
for idx, row in df.iterrows():
     for i, text in enumerate(splitter.split_text(row['content_ready'])):
         chunks.append({
             'doc_id': row['id'],
             'title': row['title'],
             'source': row['source'],
             'category': row['category'],
             'chunk_idx': i,
             'text': text
         })
print(f"Created {len(chunks)} chunks")


# 3c: Show chunk statistics
# Why: Verify chunking worked as expected. Odd min/max = adjust chunk_size and re-run.
chunk_lengths = [len(c['text']) for c in chunks]
print(f"Chunk lengths - Min: {min(chunk_lengths)}, Max: {max(chunk_lengths)}")

# 3d: Show example chunks
# read actual chunks to confirm they're meaningful (not cut mid-word/sentence).
example_chunks = [c for c in chunks if 'title' in c][:2]
for i, chunk in enumerate(example_chunks):
        print(f"Chunk {i}: {chunk['text'][:150]}...")

# 3e: View as DataFrame
chunks_df = pd.DataFrame(chunks)
print(chunks_df.head())


Created 7655 chunks
Chunk lengths - Min: 55, Max: 500
Chunk 0: An assassin is shot by her ruthless employer, Bill, and other members of their assassination circle ‚Äì but she lives to plot her vengeance. Genre: Acti...
Chunk 1: Jarhead is a film about a US Marine Anthony Swofford‚Äôs experience in the Gulf War. After putting up with an arduous boot camp, Swofford and his unit a...
   doc_id                                  title              source  \
0       1      Kill Bill: Vol. 1 Kill Bill Vol 1  netflix_titles.csv   
1       2                                Jarhead  netflix_titles.csv   
2       2                                Jarhead  netflix_titles.csv   
3       3  Eternal Sunshine of the Spotless Mind  netflix_titles.csv   
4       3  Eternal Sunshine of the Spotless Mind  netflix_titles.csv   

    category  chunk_idx                                               text  
0     Action          0  An assassin is shot by her ruthless employer, ...  
1  Biography          0  Jarh

## Section 4: Create Embeddings for Your Data

Generate embeddings for all chunks using HuggingFace (all-MiniLM-L6-v2).

In [None]:
# 4a: Initialize HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"batch_size": 32, "normalize_embeddings": True}
)

# 4b: Batch embed ALL chunks (memory-efficient)

all_embeddings = []
batch_size = 32
for i in range(0, len(chunks), batch_size):
    batch = chunks[i:i+batch_size]
    batch_texts = [c['text'] for c in batch]
    batch_embeddings = embeddings.embed_documents(batch_texts)
    all_embeddings.extend(batch_embeddings)

print(f"Embedded {len(all_embeddings)} chunks")

# Calculate total tokens
total_tokens = sum(len(c['text'].split()) for c in chunks)
print(f"Total tokens embedded: {total_tokens:,}")


Embedded 7655 chunks
Total tokens embedded: 435,082


## Section 5: Build & Query a Local Vector Store

Store embeddings in Chroma vector database and test similarity search.

In [None]:
# 5a: Create LangChain Document objects from chunks
# Wraps chunks with metadata in a format LangChain understands (standard object type).

documents = []
for chunk in chunks:
    doc = Document(
        page_content=chunk['text'],
        metadata={
            'doc_id': chunk['doc_id'],
            'title': chunk['title'],
            'source': chunk['source'],
            'category': chunk['category'],
            'chunk_idx': chunk['chunk_idx']
        }
    )
    documents.append(doc)

print(f"‚úì Created {len(documents)} Document objects")

# 5b: Initialize Chroma vector store
# Chroma: Fast similarity search + persistent storage (saves time on re-runs).
# persist_directory: Stores vectors on disk so you don't re-embed on next notebook run.

vectorstore = Chroma(
    persist_directory="./data/vectorstore",
    embedding_function=embeddings
)

# 5c: Populate vector store with documents
# Process 100 at a time to avoid memory spikes. All docs cumulative.

batch_size = 100
for i in range(0, len(documents), batch_size):
    batch = documents[i:i+batch_size]
    batch_ids = [f"doc_{j}" for j in range(len(batch))]
    vectorstore.add_documents(batch, ids=batch_ids)

print(f"‚úì Vector store populated with {len(documents)} documents")

# 5d: Test similarity search
# Confirm vector store populated and working. No results = empty or corrupted store.

test_queries = ["What are action movies?", "Tell me about animated films"]
for query in test_queries:
    results = vectorstore.similarity_search(query, k=2)
    print(f"\nQuery: {query}")
    for doc in results:
        print(f"  - {doc.metadata['title']}: {doc.page_content[:100]}...")


‚úì Created 7655 Document objects
‚úì Vector store populated with 7655 documents

Query: What are action movies?
  - Sakuna: Of Rice and Ruin Sakuna Of Rice and Ruin: . Her new adventure begins! Genre: Action, Adventure, Animation Type: tv Release Year: 2024 Director...
  - Toughest Forces on Earth: Three adventurous veterans train alongside some of the world's most elite military units, getting an...

Query: Tell me about animated films
  - The Grimm Variations: Inspired by the classic Brothers Grimm stories, this anthology features six fairy tales with a dark ...
  - Sakuna: Of Rice and Ruin Sakuna Of Rice and Ruin: . Her new adventure begins! Genre: Action, Adventure, Animation Type: tv Release Year: 2024 Director...


## Section 6: Create a Retriever

Wrap the vector store in a LangChain Retriever for easy integration.

In [None]:
# 6a: Retrieve similar documents
def find_n_closest(query, retriever, top_k):
     """Find top-k most similar documents - mirrors Pinecone pattern"""
     docs = retriever.invoke(query)
     return docs[:top_k]


# 6a: Create retriever (returns top 3 similar docs)
retriever = vectorstore.as_retriever(
     search_type="similarity",
     search_kwargs={"k": 3}
)


# 6b: Test retriever with a simple retrieval question
# Verify retriever finds semantically similar documents (not analytical questions yet)

query = "action movies"

retrieved_docs = find_n_closest(query, retriever, top_k=3)
print(f"\nQuery: '{query}'")
print(f"Found {len(retrieved_docs)} similar documents:\n")
for i, doc in enumerate(retrieved_docs, 1):
    print(f"{i}. {doc.metadata['title']}")
    print(f"   Preview: {doc.page_content[:120]}...\n")

# 6c: Verify metadata is preserved
# Confirm title, source, category carry through the pipeline for later use

doc = retrieved_docs[0]




Query: 'action movies'
Found 3 similar documents:

1. The Hijacking of Flight 601
   Preview: After a plane is hijacked, two flight attendants must outwit their assailants amid intense negotiations in the air and o...

2. Toughest Forces on Earth
   Preview: Three adventurous veterans train alongside some of the world's most elite military units, getting an inside look at thei...

3. Toughest Forces on Earth
   Preview: Three adventurous veterans train alongside some of the world's most elite military units, getting an inside look at thei...



## Section 7: Build the RAG Chain with LLM

Create prompt template and connect: Retriever ‚Üí Prompt ‚Üí LLM

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

# 7a: Create PromptTemplate for RAG
# This assistant ONLY answers questions about Netflix movies
prompt_temp = PromptTemplate(
    input_variables=["context", "question"],
    template="""You are a Netflix movie expert. Answer ONLY questions about Netflix movies based on the context below.
If the question is not about Netflix movies, say you can only help with Netflix movie questions.
Be direct and concise.

Context: 
{context}

Question: {question}

Answer:
"""
)

# 7b: Format Document objects to clean text
def format_docs(docs):
    """Format Document objects into clean text for LLM"""
    return "\n\n---\n\n".join([doc.page_content for doc in docs])

# 7c: Initialize TinyLlama-1.1B (1B parameters, lightweight + fast)
llm = HuggingFacePipeline.from_model_id(
    model_id="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 100,
        "temperature": 0.9,
        "top_p": 0.9,
        "do_sample": True
    }
)

# 7d: Create LCEL RAG chain with proper document formatting
def get_retrieved_docs(query):
    return retriever.invoke(query)

rag_chain = (
    {
        "context": retriever | RunnableLambda(format_docs),
        "question": RunnablePassthrough(),
        "retrieved_docs": RunnableLambda(lambda x: retriever.invoke(x))  # Capture docs
    }
    | prompt_temp
    | llm
    | StrOutputParser()
)
# Returns: answer + documents

print("LCEL RAG chain ready")

Device set to use cpu


‚úì LCEL RAG chain ready


## Section 8: Test Your RAG System

Run manual tests on various question types and inspect results.

In [136]:
# 8a: Test the LCEL RAG chain with actual LLM responses
query = "What documentaries are available?"
# 8b: Invoke the chain - retriever ‚Üí format ‚Üí prompt ‚Üí LLM ‚Üí response

result = rag_chain.invoke(query)

print(f"\nResponse:\n{result}")



Response:
You are a Netflix movie expert. Answer ONLY questions about Netflix movies based on the context below.
If the question is not about Netflix movies, say you can only help with Netflix movie questions.
Be direct and concise.

Context: 
In this gripping docuseries, legendary reporter George Knapp travels the globe to uncover new evidence about UFOs and investigate their presence on Earth. Genre: Documentary Type: tv Release Year: 2024 Director: Unknown Actors: George Knapp IMDB Rating: 6.5 Available in 128 countries

---

In this gripping docuseries, legendary reporter George Knapp travels the globe to uncover new evidence about UFOs and investigate their presence on Earth. Genre: Documentary Type: tv Release Year: 2024 Director: Unknown Actors: George Knapp IMDB Rating: 6.5 Available in 128 countries

---

Eerie encounters, bizarre disappearances, haunting events and more perplexing phenomena are explored in this chilling investigative docuseries. Genre: Crime, Documentary, My

## Section 9: REST API Deployment

Deploy the RAG system as a REST API for production use.

In [None]:
# Flask API
#
# Files created:
# - app.py: Flask application with REST endpoints
# - rag_pipeline.py: RAG pipeline initialization (embeddings, retriever, LLM chain)
#
# Architecture (industry standard):
# 
# ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
# ‚îÇ   HTTP Layer    ‚îÇ  (app.py - Flask routes)
# ‚îÇ  /ask, /health  ‚îÇ
# ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
#          ‚Üì
# ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
# ‚îÇ  RAG Logic      ‚îÇ  (rag_pipeline.py - core functionality)
# ‚îÇ ask_question()  ‚îÇ
# ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
#          ‚Üì
# ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
# ‚îÇ  Data Layer     ‚îÇ  (Sections 1-8 - retriever, LLM chain)
# ‚îÇ  RAG Pipeline   ‚îÇ
# ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
# 
# Why separate files?
# - Cleaner: Notebook focuses on learning, code focuses on production
# - Reusable: RAG logic can be imported by multiple services
# - Testable: Each layer tested independently
# - Scalable: Deploy API and pipeline separately if needed

# Running the API
# Terminal 1 (run RAG API server):
#   $ source .venv/bin/activate
#   $ python app.py
#   Output: üöÄ Flask API running on http://127.0.0.1:5500
#
# Terminal 2 (test the API):
#   $ curl -X POST http://localhost:5500/ask \
#     -H "Content-Type: application/json" \
#     -d '{"question": "What are some comedy movies?"}'
#
# Response:
#   {
#     "answer": "Based on Netflix movies...",
#     "sources": ["Movie Title 1", "Movie Title 2", "Movie Title 3"]
#   }

