**üîí Proprietary & All Rights Reserved**

**¬© 2025 Sweety Seelam.** This work is proprietary and protected by copyright. All content, models, code, and visuals are ¬© 2025 Sweety Seelam. 
No part of this project, app, code, or analysis may be copied, reproduced, distributed, or used for any purpose‚Äîcommercial or otherwise‚Äîwithout explicit written permission from the author.

-------------

# StreamIntel360: A Multi-Agent RAG Platform for Streaming Content & Revenue Intelligence

-----------

# 04 ‚Äì RAG & Agent Evaluation for StreamIntel360

This notebook evaluates the **retrieval + multi‚Äëagent reasoning** stack behind StreamIntel360.

**Goals:**

- Reuse the Netflix titles catalog and text corpus built in earlier notebooks.
- Build (or rebuild) an embedding index using SentenceTransformers + FAISS.
- Define a small evaluation set of **natural‚Äëlanguage queries** and **expected relevant titles**.
- Compute simple retrieval metrics such as **Hit@k** (does any relevant title appear in the top‚Äëk?).
- (Optional) Call the running FastAPI backend (`/api/chat`) to inspect full multi‚Äëagent answers for a query.


In [1]:
# Install dependencies
!pip install pandas numpy
!pip install sentence-transformers faiss-cpu requests typing_extensions
!pip install tf-keras



In [2]:
# Core imports
import pandas as pd
import numpy as np
from pathlib import Path
from typing import List, Dict, Any

# Embeddings + vector index
from sentence_transformers import SentenceTransformer
import faiss

# Optional: talk to the running backend
import requests

pd.set_option("display.max_colwidth", 200)




In [3]:
# Paths
DATA_DIR = Path("..") / "data" / "raw"
FILE_PATH = DATA_DIR / "netflix_titles.csv"

FILE_PATH, FILE_PATH.exists()

(WindowsPath('../data/raw/netflix_titles.csv'), True)

In [5]:
# Load the Netflix titles catalog
# Use latin-1 to avoid UnicodeDecodeError on some Kaggle CSVs
df = pd.read_csv(FILE_PATH, encoding="latin-1")

# Drop any extra unnamed columns created by bad separators
df = df.loc[:, ~df.columns.str.contains("^Unnamed")]

df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Ma...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV Comedies","In a city of coaching centers known to train India√¢¬Ä¬ôs finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life."


## 1. Build Text Corpus for RAG

We construct a **corpus string per title** that combines the most important metadata for semantic search (title, type, genres, country, year, description).

In [6]:
def build_text_variant(row, variant: str = "baseline") -> str:
    """Build different corpus variants for semantic search experiments.

    * "title_only" ‚Äì just the title.
    * "title_description" ‚Äì title + description.
    * "baseline" ‚Äì rich context (title, type, genres, country, year, description).
    """
    if variant == "title_only":
        return str(row.get("title", "")).strip()

    if variant == "title_description":
        title = str(row.get("title", "")).strip()
        desc = str(row.get("description", "")).strip()
        return f"Title: {title} | Description: {desc}"

    if variant == "baseline":
        parts = [
            f"Title: {row.get('title', '')}",
            f"Type: {row.get('type', '')}",
            f"Genres: {row.get('listed_in', '')}",
            f"Country: {row.get('country', '')}",
            f"Year: {row.get('release_year', '')}",
            f"Description: {row.get('description', '')}",
        ]
        return " | ".join(p for p in parts if p)

    raise ValueError(f"Unknown variant: {variant}")


# Clean key columns and build the corpus text
df = df.copy()
for col in ["title", "description", "listed_in", "type", "country"]:
    if col not in df.columns:
        df[col] = ""
    df[col] = df[col].fillna("").astype(str)

VARIANT = "baseline"  # you can change this to "title_only" or "title_description"
df["corpus_text"] = df.apply(build_text_variant, axis=1, variant=VARIANT)
df[["title", "corpus_text"]].head(3)

Unnamed: 0,title,corpus_text
0,Dick Johnson Is Dead,"Title: Dick Johnson Is Dead | Type: Movie | Genres: Documentaries | Country: United States | Year: 2020 | Description: As her father nears the end of his life, filmmaker Kirsten Johnson stages his..."
1,Blood & Water,"Title: Blood & Water | Type: TV Show | Genres: International TV Shows, TV Dramas, TV Mysteries | Country: South Africa | Year: 2021 | Description: After crossing paths at a party, a Cape Town teen..."
2,Ganglands,"Title: Ganglands | Type: TV Show | Genres: Crime TV Shows, International TV Shows, TV Action & Adventure | Country: | Year: 2021 | Description: To protect his family from a powerful drug lord, sk..."


## 2. Build Embedding Index

We use a local SentenceTransformers model (`all-MiniLM-L6-v2`) so this notebook can run **without any external API keys**.

In [7]:
model_name = "all-MiniLM-L6-v2"
embed_model = SentenceTransformer(model_name)
print(f"Loaded SentenceTransformer model: {model_name}")

Loaded SentenceTransformer model: all-MiniLM-L6-v2


In [9]:
###### For speed, you can subsample if the dataset is large.
# Set MAX_ROWS = None to embed everything.
MAX_ROWS = None  # e.g., 20000 for faster experiments

if MAX_ROWS is not None and len(df) > MAX_ROWS:
    df_sample = df.sample(n=MAX_ROWS, random_state=42).reset_index(drop=True)
else:
    df_sample = df.reset_index(drop=True)

texts = df_sample["corpus_text"].tolist()
len(texts)

8809

In [10]:
# This step can take a few minutes depending on dataset size and hardware.
embeddings = embed_model.encode(texts, batch_size=64, show_progress_bar=True)
embeddings.shape

Batches:   0%|          | 0/138 [00:00<?, ?it/s]

(8809, 384)

In [11]:
d = embeddings.shape[1]  # embedding dimensionality
index = faiss.IndexFlatL2(d)
index.add(embeddings)

print("FAISS index size:", index.ntotal)

FAISS index size: 8809


## 3. Similarity Search Helper

We define a small helper that returns the **top‚Äëk nearest titles** for a natural‚Äëlanguage query.

In [12]:
def search_similar(query: str, k: int = 10) -> List[Dict[str, Any]]:
    """Encode a query and return top-k similar titles with metadata."""
    q_emb = embed_model.encode([query])
    distances, indices = index.search(q_emb, k)
    results = []
    for rank, (dist, idx) in enumerate(zip(distances[0], indices[0]), start=1):
        row = df_sample.iloc[int(idx)]
        results.append(
            {
                "rank": rank,
                "title": row.get("title", ""),
                "type": row.get("type", ""),
                "distance": float(dist),
                "description": row.get("description", ""),
                "genres": row.get("listed_in", ""),
                "year": int(row.get("release_year"))
                if pd.notna(row.get("release_year"))
                else None,
            }
        )
    return results


# Quick smoke test
test_query = "A dark crime thriller about a serial killer in a big city"
for r in search_similar(test_query, k=5):
    print(f"#{r['rank']} | {r['title']} ({r['year']}) [{r['type']}]")
    print(f"Genres: {r['genres']}")
    print(f"Distance: {r['distance']:.4f}")
    print("-" * 80)

#1 | Dark Crimes (2016) [Movie]
Genres: Dramas, Thrillers
Distance: 0.5851
--------------------------------------------------------------------------------
#2 | Small Town Crime (2017) [Movie]
Genres: Thrillers
Distance: 0.5972
--------------------------------------------------------------------------------
#3 | November Criminals (2017) [Movie]
Genres: Dramas, Thrillers
Distance: 0.7341
--------------------------------------------------------------------------------
#4 | Night Stalker: The Hunt for a Serial Killer (2021) [TV Show]
Genres: Crime TV Shows, Docuseries
Distance: 0.7428
--------------------------------------------------------------------------------
#5 | Twin Murders: the Silence of the White City (2020) [Movie]
Genres: International Movies, Thrillers
Distance: 0.8044
--------------------------------------------------------------------------------


## 4. Define a Tiny Evaluation Set

We create a **hand‚Äëcrafted mini‚Äëbenchmark** of queries and a few titles that should be relevant. This is not a full offline benchmark, but it gives you a feel for how well retrieval behaves.

In [13]:
evaluation_queries = [
    {
        "id": "crime_thriller",
        "query": "A gritty crime thriller about a serial killer investigated by stubborn detectives.",
        "relevant_titles": [
            "Mindhunter",
            "Zodiac",
            "Se7en",
            "The Sinner",
        ],
    },
    {
        "id": "feel_good_family",
        "query": "A heartwarming family movie about kids and their loyal dog, with lots of emotion.",
        "relevant_titles": [
            "A Dog's Purpose",
            "Benji",
            "Marley & Me",
            "Because of Winn-Dixie",
        ],
    },
    {
        "id": "teen_romcom",
        "query": "A light teen romantic comedy set in high school, full of crushes and drama.",
        "relevant_titles": [
            "To All the Boys I've Loved Before",
            "The Kissing Booth",
            "Mean Girls",
        ],
    },
    {
        "id": "sci_fi_space",
        "query": "A science fiction story about astronauts exploring deep space and facing unknown threats.",
        "relevant_titles": [
            "Interstellar",
            "Gravity",
            "The Cloverfield Paradox",
        ],
    },
]

len(evaluation_queries)

4

### Hit@k Metric

`Hit@k` answers the question: **For each query, does at least one relevant title appear in the top‚Äëk retrieved items?**

In [14]:
def evaluate_hit_at_k(
    queries: List[Dict[str, Any]],
    k: int = 10,
) -> pd.DataFrame:
    """Compute Hit@k for a small set of queries.

    Returns a DataFrame with per‚Äëquery results and an aggregate mean.
    """
    records = []

    for q in queries:
        qid = q["id"]
        query_text = q["query"]
        relevant = [t.lower() for t in q["relevant_titles"]]

        results = search_similar(query_text, k=k)
        retrieved_titles = [r["title"].lower() for r in results]

        hit = any(rt in retrieved_titles for rt in relevant)
        hit_rank = None
        for r in results:
            if r["title"].lower() in relevant:
                hit_rank = r["rank"]
                break

        records.append(
            {
                "id": qid,
                "query": query_text,
                "hit": hit,
                "hit_rank": hit_rank,
                "relevant_titles": ", ".join(q["relevant_titles"]),
                "retrieved_titles": ", ".join(r["title"] for r in results),
            }
        )

    df_eval = pd.DataFrame(records)
    df_eval.loc["mean"] = {
        "id": "mean",
        "query": "",
        "hit": df_eval["hit"].mean(),
        "hit_rank": df_eval["hit_rank"].mean(),
        "relevant_titles": "",
        "retrieved_titles": "",
    }
    return df_eval

In [15]:
# Evaluate for different k values
for k in [3, 5, 10]:
    print(f"\n=== Hit@{k} ===")
    df_eval_k = evaluate_hit_at_k(evaluation_queries, k=k)
    display(df_eval_k)


=== Hit@3 ===


Unnamed: 0,id,query,hit,hit_rank,relevant_titles,retrieved_titles
0,crime_thriller,A gritty crime thriller about a serial killer investigated by stubborn detectives.,0.0,,"Mindhunter, Zodiac, Se7en, The Sinner","Dark Crimes, Unknown Origins, A Kind of Murder"
1,feel_good_family,"A heartwarming family movie about kids and their loyal dog, with lots of emotion.",1.0,1.0,"A Dog's Purpose, Benji, Marley & Me, Because of Winn-Dixie","Benji, Life in the Doghouse, Dog Gone Trouble"
2,teen_romcom,"A light teen romantic comedy set in high school, full of crushes and drama.",0.0,,"To All the Boys I've Loved Before, The Kissing Booth, Mean Girls","Must Be... Love, The Last Summer, Good Kids"
3,sci_fi_space,A science fiction story about astronauts exploring deep space and facing unknown threats.,0.0,,"Interstellar, Gravity, The Cloverfield Paradox","A StoryBots Space Adventure, 3022, The Search for Life in Space"
mean,mean,,0.25,1.0,,



=== Hit@5 ===


Unnamed: 0,id,query,hit,hit_rank,relevant_titles,retrieved_titles
0,crime_thriller,A gritty crime thriller about a serial killer investigated by stubborn detectives.,0.0,,"Mindhunter, Zodiac, Se7en, The Sinner","Dark Crimes, Unknown Origins, A Kind of Murder, Basic Instinct, November Criminals"
1,feel_good_family,"A heartwarming family movie about kids and their loyal dog, with lots of emotion.",1.0,1.0,"A Dog's Purpose, Benji, Marley & Me, Because of Winn-Dixie","Benji, Life in the Doghouse, Dog Gone Trouble, Bitch, Pets United"
2,teen_romcom,"A light teen romantic comedy set in high school, full of crushes and drama.",0.0,,"To All the Boys I've Loved Before, The Kissing Booth, Mean Girls","Must Be... Love, The Last Summer, Good Kids, Adventures in Public School, Back to School"
3,sci_fi_space,A science fiction story about astronauts exploring deep space and facing unknown threats.,0.0,,"Interstellar, Gravity, The Cloverfield Paradox","A StoryBots Space Adventure, 3022, The Search for Life in Space, Lockout, The Midnight Sky"
mean,mean,,0.25,1.0,,



=== Hit@10 ===


Unnamed: 0,id,query,hit,hit_rank,relevant_titles,retrieved_titles
0,crime_thriller,A gritty crime thriller about a serial killer investigated by stubborn detectives.,0.0,,"Mindhunter, Zodiac, Se7en, The Sinner","Dark Crimes, Unknown Origins, A Kind of Murder, Basic Instinct, November Criminals, Inside the Mind of a Serial Killer, Memoir of a Murderer, The Investigator: A British Crime Story, Small Town Cr..."
1,feel_good_family,"A heartwarming family movie about kids and their loyal dog, with lots of emotion.",1.0,1.0,"A Dog's Purpose, Benji, Marley & Me, Because of Winn-Dixie","Benji, Life in the Doghouse, Dog Gone Trouble, Bitch, Pets United, Show Dogs, All Dogs Go to Heaven, Hotel for Dogs, A Champion Heart, Puppy Star Christmas"
2,teen_romcom,"A light teen romantic comedy set in high school, full of crushes and drama.",0.0,,"To All the Boys I've Loved Before, The Kissing Booth, Mean Girls","Must Be... Love, The Last Summer, Good Kids, Adventures in Public School, Back to School, The New Romantic, What's Up With Love?, Just Friends, Comedy High School, Nevertheless,"
3,sci_fi_space,A science fiction story about astronauts exploring deep space and facing unknown threats.,1.0,9.0,"Interstellar, Gravity, The Cloverfield Paradox","A StoryBots Space Adventure, 3022, The Search for Life in Space, Lockout, The Midnight Sky, A Year In Space, Alien Contact: Outer Space, Countdown: Inspiration4 Mission to Space, The Cloverfield P..."
mean,mean,,0.5,5.0,,


## 5. (Optional) Call the Running StreamIntel360 Backend

If your FastAPI backend is running on `http://localhost:8000`, you can send a query to the multi‚Äëagent `/api/chat` endpoint and compare the **retrieval‚Äëonly** behavior to the full, reasoned answer.

In [18]:
BACKEND_URL = "http://127.0.0.1:8000" #"http://localhost:8000"

def call_streamintel_chat(prompt: str) -> str:
    """Call the /api/chat endpoint of the running backend."""
    url = f"{BACKEND_URL}/api/chat"
    payload = {
        "message": prompt,
        "history": [],
    }
    try:
        resp = requests.post(url, json=payload, timeout=60)
        resp.raise_for_status()
        data = resp.json()
        return data.get("answer", str(data))
    except Exception as e:
        print("Error calling backend:", e)
        return ""


# Example (only works if backend is running)
example_prompt = "Suggest a thriller movie where hero is a police officer in the US."
backend_answer = call_streamintel_chat(example_prompt)
print(backend_answer[:2000])

**Executive Summary: Thriller Movie Featuring a US Police Officer Hero**

**Overview:**  
A thriller centered on a US police officer as the protagonist aligns well with established audience interests in crime, suspense, and morally complex law enforcement stories. The concept benefits from proven appeal in key global markets, particularly if it emphasizes nuanced character development and fresh narrative angles. To succeed, the story should avoid overused tropes and incorporate unique elements‚Äîwhether through setting, character background, or thematic depth‚Äîto stand out in a moderately saturated genre.

**Why this could work:**  
- Strong cultural resonance in North America and growing international interest in US police thrillers.  
- Popular themes of redemption, moral ambiguity, and psychological tension engage core thriller audiences.  
- Opportunity to differentiate through unique cop profiles, innovative settings, or contemporary social commentary.  
- Appeals to a broad age 

-----------

## 6. Summary & Next Steps

In this notebook we:

- Reused the **Netflix titles catalog** and constructed a rich text corpus for RAG.
- Embedded the catalog using **SentenceTransformers** and indexed it with **FAISS**.
- Defined a small hand‚Äëcurated **evaluation set** and computed **Hit@k** metrics.
- Optionally called the **StreamIntel360 FastAPI backend** to inspect multi‚Äëagent answers.

This gives you a starting point to:

- Grow the evaluation set (more queries, more diverse genres and markets).
- Track retrieval quality over time as you tweak corpus construction or embedding models.
- Connect metrics here to real user behavior (e.g., which titles your agents actually surface in production).

-----------

## Summary

***‚ÄúHow well do our retrieval + agents actually perform as a system?‚Äù***

**What I did & why?**

**1.Reloaded the Netflix catalog and rebuilt the same corpus_text.**

- Reason: ensure the evaluation uses the exact same text recipe the real RAG layer uses.

**2.Re-embedded the corpus using all-MiniLM-L6-v2 and re-built the FAISS index.**

- Reason: have a self-contained evaluation notebook that doesn‚Äôt depend on Notebook 2‚Äôs state.

**3.Defined search_similar again, but this time used it as part of an evaluation harness.**

- Reason: we need a reusable function for metrics, not just eyeballing results.

**4.Created a tiny evaluation set of realistic queries with known ‚Äúrelevant titles‚Äù.**

- Example:

    - crime thriller ‚Üí Mindhunter, Zodiac, Se7en, The Sinner

    - feel-good dog family ‚Üí A Dog‚Äôs Purpose, Benji, etc.

- Reason: create a mini benchmark to measure retrieval quality in a structured way.

**5.Implemented Hit@k metrics for k = 3, 5, 10.**

- Hit@k asks: ‚ÄúDoes at least one relevant title appear in the top-k results?‚Äù

- We see:

    - Hit@3 ‚âà 0.25

    - Hit@5 ‚âà 0.25

    - Hit@10 ‚âà 0.5

- Reason: quantify how often the catalog contains your target titles in the top-k for different query types.

**6. Optionally called the running FastAPI backend /api/chat.**

- I sent: ‚ÄúSuggest a thriller movie where hero is a police officer in the US.‚Äù

- The backend returned a structured, multi-paragraph executive summary, showing that the LangGraph multi-agent stack is live.

- Reason: verify end-to-end behavior: retrieval + reasoning + summarization via your actual backend, not just offline FAISS.

**What I have achieved?**

- I built a proper evaluation harness for your RAG + agent system, not just an embedding demo.

- We can now:

    - Track Hit@k as you change models, corpus variants, or index parameters.

    - Spot weaknesses (e.g., sci-fi queries only hit at k=9).

    - Validate that the live backend agent responds coherently and matches the retrieved titles‚Äô intent.
 
-------------

## Conclusion

- This notebook answers: ‚ÄúIs the semantic search and multi-agent system actually working, and how good is it?‚Äù

- Now, we have a repeatable benchmark + live-backend sanity check that we can re-run after any model or corpus change.