<div align="center">
    <h1>Real-Time Movie Recommender System (IMDB Dataset)</h1>
</div>

Using ChromaDB, Hugging Face Embeddings & ChatGroq API

# 🎯 Key Features

### Core Capabilities
- ✨ **Hybrid Recommendation System**
    - Content-Based Filtering
    - Collaborative Filtering
    - Seamless Integration of Both Approaches

### Advanced Functionality  
- 🔄 **Real-Time Personalization**
    - Dynamic User Behavior Analysis
    - Adaptive Feedback Integration
    - Continuous Learning System

### Rich Data Integration
- 🎬 **Comprehensive Movie Metadata**
    - Movie Posters & Visual Assets
    - Genre Classification
    - Cast & Crew Information
    - Performance Metrics (IMDB Scores, Votes)
    - Box Office Statistics

### AI Enhancement
- 🤖 **ChatGroq-Powered Intelligence**
    - Natural Language Explanations
    - Contextual Recommendation Logic
    - Human-Like Reasoning

### Production Ready
- ⚡ **Enterprise-Grade Infrastructure**
    - FastAPI Backend
    - Persistent ChromaDB Storage
    - Scalable Architecture

# Step 1: Load & Process the IMDB Dataset

In [1]:
import pandas as pd

# Load IMDB dataset
df = pd.read_csv("Data/imdb_top_1000.csv")

# Select relevant features
df = df[["Series_Title", "Genre", "IMDB_Rating", "Overview", "Director",
         "Star1", "Star2", "Star3", "Star4", "No_of_Votes", "Gross"]]

# Handle missing values
df.fillna("", inplace=True)

print(df.info())
print(df.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Series_Title  1000 non-null   object 
 1   Genre         1000 non-null   object 
 2   IMDB_Rating   1000 non-null   float64
 3   Overview      1000 non-null   object 
 4   Director      1000 non-null   object 
 5   Star1         1000 non-null   object 
 6   Star2         1000 non-null   object 
 7   Star3         1000 non-null   object 
 8   Star4         1000 non-null   object 
 9   No_of_Votes   1000 non-null   int64  
 10  Gross         1000 non-null   object 
dtypes: float64(1), int64(1), object(9)
memory usage: 86.1+ KB
None
               Series_Title                 Genre  IMDB_Rating  \
0  The Shawshank Redemption                 Drama          9.3   
1             The Godfather          Crime, Drama          9.2   
2           The Dark Knight  Action, Crime, Drama          9.0   
3  

# Step 2: Convert Movies into Embeddings

In [2]:
# (A) Generate Textual Representations
df["movie_description"] = df.apply(
    lambda row: f"""{row['Series_Title']} is a {row['Genre']} movie, 
    directed by {row['Director']}, 
    starring {', '.join([row['Star1'], row['Star2'], row['Star3'], row['Star4']])}.
    It has an IMDB rating of {row['IMDB_Rating']} and {row['No_of_Votes']} votes.
    Overview: {row['Overview']}""".replace('\n    ', ' ').strip(),
    axis=1
)


In [3]:
# (B) Convert Descriptions into Embeddings
from sentence_transformers import SentenceTransformer

# Load Hugging Face transformer model
hf_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Generate embeddings
df["embeddings"] = df["movie_description"].apply(lambda x: hf_model.encode(x))

print("✅ Movie embeddings generated!")


2025-03-24 18:10:07.461855: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742818207.595337  141200 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742818207.633317  141200 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742818207.874568  141200 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742818207.874614  141200 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1742818207.874618  141200 computation_placer.cc:177] computation placer alr

✅ Movie embeddings generated!


In [4]:
df.head()

Unnamed: 0,Series_Title,Genre,IMDB_Rating,Overview,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross,movie_description,embeddings
0,The Shawshank Redemption,Drama,9.3,Two imprisoned men bond over a number of years...,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469,"The Shawshank Redemption is a Drama movie, di...","[-0.08365406, -0.046357177, -0.091286466, -0.0..."
1,The Godfather,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411,"The Godfather is a Crime, Drama movie, direct...","[-0.09685207, -0.014758187, -0.08018005, 0.001..."
2,The Dark Knight,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444,"The Dark Knight is a Action, Crime, Drama movi...","[0.0142227635, -0.004213256, -0.13376291, 0.01..."
3,The Godfather: Part II,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000,"The Godfather: Part II is a Crime, Drama movie...","[-0.047394026, 0.021395307, -0.0040077716, 0.0..."
4,12 Angry Men,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000,"12 Angry Men is a Crime, Drama movie, directe...","[-0.033988014, 0.030317575, -0.121169224, 0.00..."


# Step 3: Store Movies in ChromaDB for Retrieval

In [5]:
import os

# Define the path to store the ChromaDB locally
db_path = "chroma_db_movies"
os.makedirs(db_path, exist_ok=True)  # Create directory if it doesn't exist

In [None]:
# (A) Initialize ChromaDB
from chromadb import PersistentClient
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

chroma_client = PersistentClient(path=db_path)

# Define embedding function
hf_embeddings = SentenceTransformerEmbeddingFunction("sentence-transformers/all-MiniLM-L6-v2")

# Create ChromaDB collection
movie_db = chroma_client.create_collection(name="movies", embedding_function=hf_embeddings)

In [8]:
# (B) Add Movies to ChromaDB
for i, row in df.iterrows():
    movie_db.add(
        ids=[str(i)],  # Unique movie ID
        embeddings=[row["embeddings"]],  # Embedding vector
        metadatas=[{"title": row["Series_Title"], "genre": row["Genre"], "director": row["Director"],
                    "stars": ", ".join([row["Star1"], row["Star2"], row["Star3"], row["Star4"]]),
                    "imdb_rating": float(row["IMDB_Rating"]), "votes": int(row["No_of_Votes"]),
                    "overview": row["Overview"], "gross": str(row["Gross"])}]
    )

print("✅ Movies stored in ChromaDB!")

✅ Movies stored in ChromaDB!


# Build a Personalized Recommendation System

In [9]:
# (A) Search for Similar Movies
def get_similar_movies(movie_name, k=5):
    movie_info = df[df["Series_Title"] == movie_name].iloc[0]
    query_embedding = hf_model.encode(movie_info["movie_description"])

    # Retrieve similar movies
    results = movie_db.query(query_embeddings=[query_embedding], n_results=k)

    return [res["title"] for res in results["metadatas"][0] if res["title"] != movie_name]

In [10]:
# (B) Get Recommendations
print("🎥 Recommended Movies:", get_similar_movies("Inception"))

🎥 Recommended Movies: ['Memento', 'The Matrix', 'Interstellar', 'The Godfather: Part III']


# Real-Time User Personalization

In [11]:
user_profiles = {
    "U001": {
        "liked_genres": ["Sci-Fi", "Thriller"],
        "favorite_actors": ["Leonardo DiCaprio", "Robert Downey Jr."],
        "recent_watches": ["Inception", "Interstellar", "Iron Man"],
        "feedback": {"Interstellar": 5, "Inception": 4, "Iron Man": 3}
    }
}

In [12]:
def get_personalized_recommendations(user_id, k=3):
    user_profile = user_profiles[user_id]
    liked_genres = set(user_profile["liked_genres"])
    favorite_actors = set(user_profile["favorite_actors"])
    
    results = []
    
    for movie in user_profile["recent_watches"]:
        movie_results = get_similar_movies(movie, k=5)
        for doc in movie_results:
            movie_data = df[df["Series_Title"] == doc].iloc[0]
            if any(genre in liked_genres for genre in movie_data["Genre"].split(", ")) or \
               any(actor in favorite_actors for actor in [movie_data["Star1"], movie_data["Star2"], movie_data["Star3"], movie_data["Star4"]]):
                results.append(movie_data["Series_Title"])
    
    return list(set(results))[:k]

print("🎥 Personalized Recommendations:", get_personalized_recommendations("U001"))

🎥 Personalized Recommendations: ['The Avengers', 'E.T. the Extra-Terrestrial', 'Interstellar']


# AI-Powered Explanations

In [13]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq

# Load environment variables
load_dotenv()

# Store API keys
groq_api_key = os.environ.get('GROQ_API_KEY')

# Set environment variables for dependent libraries
os.environ["GROQ_API_KEY"] = groq_api_key

groq_llm = ChatGroq(api_key=groq_api_key, model_name="llama3-8b-8192", temperature=0.5)

In [14]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

prompt = PromptTemplate(
    input_variables=["movie", "user"],
    template="Explain why {user} might like the movie {movie} based on their preferences."
)

chain = LLMChain(llm=groq_llm, prompt=prompt)

response = chain.run(movie="The Matrix", user="U001")

print(response)

  chain = LLMChain(llm=groq_llm, prompt=prompt)
  response = chain.run(movie="The Matrix", user="U001")


A fun challenge!

U001's preferences are:

* Favorite books: Science fiction, fantasy, and horror
* Favorite authors: Isaac Asimov, Frank Herbert, and H.P. Lovecraft
* Favorite movies: Blade Runner, Star Wars, and The Silence of the Lambs
* Favorite TV shows: Doctor Who, Star Trek, and The X-Files
* Favorite music: Electronic, industrial, and ambient

Considering these preferences, here's why U001 might like The Matrix:

1. **Science fiction**: The Matrix is a thought-provoking sci-fi movie that explores complex themes like the nature of reality, free will, and the impact of technology on society. U001's love for science fiction makes this a natural fit.
2. **Philosophical themes**: The movie's exploration of the Matrix as a simulated reality raises questions about the nature of existence, which aligns with U001's interest in philosophical and thought-provoking content.
3. **Action and suspense**: The Matrix is known for its innovative action sequences and suspenseful plot, which might