<a href="https://colab.research.google.com/github/Firu3/Movie-Recommender/blob/main/movie_recommender_advanced.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers sentence-transformers requests pandas numpy
!pip install tmdbsimple

In [None]:
import requests
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline
import tmdbsimple as tmdb

In [None]:
from google.colab import userdata
api_key = userdata.get('API_KEY')

In [None]:
# this is for converting the genre ids to real genres name
genres_url = f"https://api.themoviedb.org/3/genre/movie/list?api_key={api_key}&language=en-US"

response = requests.get(genres_url)
genres_data = response.json()["genres"]

# Create a dictionary: {id: name}
genre_map = {genre["id"]: genre["name"] for genre in genres_data}

genre_map

{28: 'Action',
 12: 'Adventure',
 16: 'Animation',
 35: 'Comedy',
 80: 'Crime',
 99: 'Documentary',
 18: 'Drama',
 10751: 'Family',
 14: 'Fantasy',
 36: 'History',
 27: 'Horror',
 10402: 'Music',
 9648: 'Mystery',
 10749: 'Romance',
 878: 'Science Fiction',
 10770: 'TV Movie',
 53: 'Thriller',
 10752: 'War',
 37: 'Western'}

In [None]:
tmdb.API_KEY = api_key
tmdb_request = tmdb.Movies()

In [None]:
def fetch_movies(pages=3):
  movies= []
  for page in range(1,pages+1):
    res = requests.get("https://api.themoviedb.org/3/movie/popular",
                       params={
                          "api_key":api_key,
                          "page": page,
                          "language": "en-US"})
    data = res.json()
    for movie in data["results"]:
      movies.append({
          "title": movie["title"],
          "overview":movie["overview"],
          "vote_average": movie["vote_average"],
          "release_date": movie.get("release_date", "N/A"),
          "genre_ids": movie["genre_ids"]
      })
  return pd.DataFrame(movies)

movies_df = fetch_movies(10)
movies_df

Unnamed: 0,title,overview,vote_average,release_date,genre_ids
0,Our Fault,Jenna and Lion's wedding brings about the long...,7.681,2025-10-15,"[10749, 18]"
1,Captain Hook - The Cursed Tides,In the aftermath of a devastating defeat by hi...,4.800,2025-07-11,"[12, 28, 27]"
2,War of the Worlds,Will Radford is a top analyst for Homeland Sec...,4.375,2025-07-29,"[878, 53]"
3,Stolen Girl,"In 1993, Maureen’s six-year-old daughter Amina...",6.300,2025-09-04,"[53, 28, 12]"
4,Hunting Grounds,"Desperate to find refuge for her children, Chl...",5.700,2025-05-16,"[28, 53]"
...,...,...,...,...,...
195,Pirates of the Caribbean: On Stranger Tides,Captain Jack Sparrow crosses paths with a woma...,6.562,2011-05-15,"[12, 28, 14]"
196,Captain America: Brave New World,After meeting with newly elected U.S. Presiden...,6.018,2025-02-12,"[28, 53, 878]"
197,Red One,After Santa Claus (codename: Red One) is kidna...,7.037,2024-10-31,"[28, 35, 14]"
198,Cholo Zombies,A possessed artifact is traced back to Dr. Bla...,0.000,2024-08-20,"[27, 35]"


In [None]:
model = SentenceTransformer('all-mpnet-base-v2')
movies_df["overview"] = movies_df["overview"].fillna("")

# Convert genre IDs to names, then create the combined text
# m['title'] and m['overview'] are strings
# m['genre_ids'] is a list of integers

# So we gonna use next lines to convert it to strings

# which turns [28, 878, 12] → "Action Science Fiction Adventure"

texts = [
    f"{m['title']}. {m['overview']}. {' '.join(genre_map[g] for g in m['genre_ids'])}."
    for _, m in movies_df.iterrows()
]


movie_embeddings = model.encode(texts,convert_to_tensor=True)
print("✅ Embeddings ready! Shape:", movie_embeddings.shape)

print(texts)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Embeddings ready! Shape: torch.Size([200, 768])
["Our Fault. Jenna and Lion's wedding brings about the long-awaited reunion between Noah and Nick after their breakup. Nick's inability to forgive Noah stands as an insurmountable barrier. He, heir to his grandfather's businesses, and she, starting her professional life, resist fueling a flame that's still alive. But now that their paths have crossed again, will love be stronger than resentment?. Romance Drama.", "Captain Hook - The Cursed Tides. In the aftermath of a devastating defeat by his archnemesis Admiral Smee, the notorious Captain James Hook finds refuge in the coastal town of Eldritch Landing, where he forms an unlikely alliance with Silas Blackweather, a local blacksmith seeking retribution for his sister's murder. As they evade Smee's Redcoat Soldiers in the island's dense woodland, ruthless sword fights, ancient curses and conflicting motives will challenge their shared quest for revenge. Together, Hook and Silas navigate 

In [None]:
def recommend_movies(user_query,top_k=5):
  query_embedding = model.encode(user_query, convert_to_tensor=True)
  scores = util.cos_sim(query_embedding, movie_embeddings)[0]
  top_results = np.argsort(-scores.cpu())[:top_k]

  recs = []
  for idx in top_results:
    idx = int(idx)
    recs.append({
        "title": movies_df.iloc[idx]["title"],
          "overview":movies_df.iloc[idx]["overview"],
          "vote_average": movies_df.iloc[idx]["vote_average"],
          "release_date": movies_df.iloc[idx].get("release_date", "N/A"),
          "genre": [genre_map[g] for g in movies_df.iloc[idx]["genre_ids"]],
          "score": float(scores[idx]),
    })

  return pd.DataFrame(recs)

In [None]:
from transformers import pipeline

chatbot = pipeline("text2text-generation", model="google/flan-t5-large")

def movie_assistant(user_query):
    recs = recommend_movies(user_query, top_k=3)

    movies_text = "\n".join([
        f"- {r['title']} ({r['vote_average']}/10)" for _, r in recs.iterrows()
    ])

    prompt = f"""
User asked: "{user_query}"

Recommended movies:
{movies_text}

Write a short, natural, and friendly paragraph (2–4 sentences) recommending these movies.
Avoid repetition, and do not make up new details.
"""

    response = chatbot(
        prompt.strip(),
        max_new_tokens=80,       # keep it concise
        do_sample=False,         # deterministic; less hallucination
        temperature=0.6,         # more controlled creativity
    )[0]["generated_text"]

    print("🎬 Movie Assistant:\n", response.strip())
    return recs


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


In [None]:
movie_assistant("I want a teenager funny movie")


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🎬 Movie Assistant:
 Inside Out 2 (7.552/10) is a sequel to the original Inside Out. It is a comedy about a teenage girl who has a crush on a boy. The Twits (6.6/10) is a comedy about a teenage girl who has a crush on a boy. It is a comedy about a teenage girl who has a crush


Unnamed: 0,title,overview,vote_average,release_date,genre,score
0,Inside Out 2,Teenager Riley's mind headquarters is undergoi...,7.552,2024-06-11,"[Animation, Adventure, Comedy, Family]",0.512638
1,The Toxic Avenger Unrated,"When a downtrodden janitor, Winston Gooze, is ...",6.34,2025-08-28,"[Action, Comedy, Science Fiction]",0.452172
2,The Twits,"When the meanest, nastiest villains pull a tri...",6.6,2025-10-17,"[Animation, Comedy, Family, Fantasy]",0.440841
