# **The Rise and Evolution of Recommender Systems 🚀**  

 Recommendation systems play a crucial role in **helping users discover relevant content** across various industries, including **e-commerce, healthcare, entertainment, marketing, and social media**.  

## **Why Are Recommendation Systems Important?**  
With vast amounts of data available, users often struggle to find content they truly enjoy and the things they desire. **Recommender systems bridge this gap** by analyzing user preferences and behavior to provide personalized suggestions.  

## **Types of Recommendation Systems**  
🔹 **Content-Based Filtering** → Recommends items based on a user's past interactions and preferences.  
🔹 **Collaborative Filtering** → Suggests items based on what similar users have liked.  
🔹 **Hybrid Systems 🌟** → Combines both, often enhanced with **machine learning** for  **higher accuracy and personalization**.  

## **Where Are They Used?**  
Many tech companies rely on recommendation engines to improve user experience:  
✔️ **Flipkart** → Suggests products based on browsing and purchase history.  
✔️ **Instagram** → Recommends posts and reels tailored to user interests.  
✔️ **LinkedIn** → Suggests professional connections.  
✔️ **Hotstar & Gaana** → Provide personalized movie and music recommendations etc.  

## **How Do They Work?**  
The recommendation model **analyzes various factors** such as:  
📌 **Watch history** (Movies you’ve watched)  
📌 **Ratings** (Your feedback on movies)  
📌 **Genre preferences** (Action, Comedy, Thriller, etc.)  
📌 **Trending data** (What’s popular right now)  

In this I have used **advanced algorithm LightFM** (a hybrid model combining content-based and collaborative filtering), to a build smarter system that **adapts to user preferences and improve engagement** over time.  

### **What’s In the Model?**  
✅ We'll explore how to build a **Hybrid Movie Recommendation System** using **TMDb 5000 dataset**  
✅ Implement **TF-IDF vectorization** for content similarity  
✅ Train a **collaborative filtering model with LightFM**  
✅ Deploy our recommendation system using **Flask & Ngrok**  

Let’s dive in and build something amazing! 🎬🔥  


# **Building a Hybrid Movie Recommendation System with TMDb 5000 Dataset 🎬**  

Welcome to this exciting journey where we'll build a **Movie Recommendation System** using the **[TMDB 5000 Movie Dataset](https://www.kaggle.com/tmdb/tmdb-movie-metadata)**! 📽️  

## **Why a Hybrid Recommendation System?**  
Traditional recommendation systems often rely on just **one** approach:  
✔️ **Content-Based Filtering** → Suggests movies similar to what you’ve liked before.  
✔️ **Collaborative Filtering** → Recommends based on what similar users prefer.  

However, each of these has limitations—**cold start problem** for new users, **popularity bias**, and **lack of diversity** in recommendations.  

💡 **Solution? A Hybrid Model!**   
We will **combine Content-Based and Collaborative Filtering** using **LightFM**, an advanced recommendation model that merges both techniques for smarter, more personalized recommendations! 🚀  

### **What We'll Do in This Notebook:**  
✅ Load and preprocess TMDb data  
✅ Extract key movie features (genres, cast, popularity)  
✅ Build a **collaborative filtering model** using LightFM  
✅ Use **TF-IDF** for content similarity  
✅ Merge both to create a **Hybrid Recommendation System**  
✅ Deploy the model as an API using **Flask & Ngrok**  




**So let's get started!**  🔥

# **1️⃣ Installing Required Packages**  

*    pyngrok allows us to make our Flask app accessible online.
*    flask-cors enables API requests from different sources.
*    lightm to create the hybrid model.
*    scikit learn for preprocessing.

In [None]:
!pip install pyngrok
!pip install lightfm
!pip install scikit-learn
!pip install flask-cors

print("\n\n\nInstalled required libraries")




Installed required libraries


# **2️⃣ Importing Libraries**

In [None]:
from flask import Flask, request, jsonify  # Creates a web API
import pandas as pd  # Works with tables and data
import numpy as np  # Handles numbers and calculations
from lightfm import LightFM  # Builds a hybrid recommendation system
from sklearn.feature_extraction.text import TfidfVectorizer  # Analyzes text data
from sklearn.metrics.pairwise import cosine_similarity  # Finds similarity between items
from scipy.sparse import csr_matrix  # Stores user-item interactions efficiently
import json, os, ast  # Helps with file handling and data processing
from pyngrok import ngrok  # Makes the API accessible online
from flask_cors import CORS  # Allows API access from different websites

from google.colab import drive
drive.mount('/content/drive')

print("Imported libraries Successfully ")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Imported libraries Successfully 


# **3️⃣ Initializing the Flask App**

In [None]:
# Initialize Flask app
app = Flask(__name__)
CORS(app)  # Enable CORS for all routes (allow API access from different websites)

<flask_cors.extension.CORS at 0x7e9311445d50>

# **4️⃣ Loading and Merging the Datasets**

In [None]:
# Load data and preprocess
movies = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/tmdb_5000_movies.csv')
credits = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/tmdb_5000_credits.csv')
data = movies.merge(credits, left_on='id', right_on='movie_id').drop('movie_id', axis=1)
data['title'] = data['title_x'].fillna(data['title_y'])
data.drop(columns=['title_x', 'title_y'], inplace=True)

print("Data loaded and preprocessed successfully")
data.head()

Data loaded and preprocessed successfully


Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,vote_average,vote_count,cast,crew,title
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,7.2,11800,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...",Avatar
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",6.9,4500,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...",Pirates of the Caribbean: At World's End
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,6.3,4466,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...",Spectre
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,7.6,9106,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...",The Dark Knight Rises
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",6.1,2124,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...",John Carter


The first dataset contains the following features:-

* movie_id - A unique identifier for each movie.
* cast - The name of lead and supporting actors.
* crew - The name of Director, Editor, Composer, Writer etc.

The second dataset has the following features:-

* budget - The budget in which the movie was made.
* genre - The genre of the movie, Action, Comedy ,Thriller etc.
* homepage - A link to the homepage of the movie.
* id - This is infact the movie_id as in the first dataset.
* keywords - The keywords or tags related to the movie.
* original_language - The language in which the movie was made.
* original_title - The title of the movie before translation or adaptation.
* overview - A brief description of the movie.
* popularity - A numeric quantity specifying the movie popularity.
* production_companies - The production house of the movie.
* production_countries - The country in which it was produced.
* release_date - The date on which it was released.
* revenue - The worldwide revenue generated by the movie.
* runtime - The running time of the movie in minutes.
* status - "Released" or "Rumored".
* tagline - Movie's tagline.
* title - Title of the movie.
* vote_average -  average ratings the movie recieved.
* vote_count - the count of votes recieved.

I have joined the two dataset on the 'id' column


# **Preprocessing the data**
Let's preprocess the data to have a Clean data for better recommendations.
To prevent errors and ensures consistency.
Improv accuracy of the recommendation system

###1️⃣ Parsing Genres & Cast:

Extracts genres and top 5 actors from JSON-like text.
Handles errors by returning 'unknown' or 'Unknown'.

###2️⃣ Removing Duplicates:
Ensures each movie appears only once using id.

###3️⃣ Formatting Release Date:

Converts release_date to datetime for better sorting and filtering.

In [None]:
# Preprocessing functions
def parse_genres(genre_str):
    try:
        return [g['name'].lower() for g in ast.literal_eval(genre_str)]
    except Exception as e:
        print(f"Error parsing genres: {genre_str}, {e}")
        return ['unknown']

def parse_cast(cast_str):
    try:
        return ', '.join([c['name'] for c in ast.literal_eval(cast_str)[:5]])
    except Exception as e:
        print(f"Error parsing cast: {cast_str}, {e}")
        return 'Unknown'

data['genres'] = data['genres'].apply(parse_genres)
data['cast'] = data['cast'].apply(parse_cast)
data.drop_duplicates(subset=['id'], inplace=True)
data['release_date'] = pd.to_datetime(data['release_date'], errors='coerce')



### **Trending Score Calculation 📊🎬**  

To improve movie recommendations, we calculate a **Trending Score** that considers popularity, ratings, and release year.  

- **Popularity (50%)** → More popular movies score higher.  
- **Vote Average (30%)** → Better-rated movies get priority.  
- **Recentness Factor (20%)** → Movies released in **2020 or later** get a boost.  

📌 **Formula Used:**
#Trending Score = (popularity × 0.5) + (vote_average × 0.3) + (recentness_factor × 0.2)



In [None]:
# Trending score calculation
def calculate_trending_score(row):
    recentness_factor = 1 if row['release_date'] and row['release_date'].year >= 2020 else 0.5
    return (row['popularity'] * 0.5) + (row['vote_average'] * 0.3) + (recentness_factor * 0.2)

data['trending_score'] = data.apply(calculate_trending_score, axis=1)
print("✅ Trending scores calculated successfully!")

✅ Trending scores calculated successfully!


# **Caching Mechanism 🗄️⚡**  

To improve performance and reduce unnecessary computations, we implement a **caching mechanism** to store user recommendations.  

1️⃣ **Load Cache** → Reads stored recommendations from a JSON file.  
2️⃣ **Save Cache** → Saves new recommendations to the file.  
3️⃣ **Update Cache** → Updates recommendations for a user based on their last chosen genre.  

📌 **Revising the pre existing recommendation to acheive**  
✅ **Faster Recommendations** → Avoids recalculating recommendations for repeated requests.  
✅ **Efficient Data Handling** → Reduces load on the system by storing past results.  
✅ **Personalized Experience** → Keeps track of the last genre the user explored.  

⚡ **File Used for Cache:**  
📂 `/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/user_cache.json`  



In [None]:

# Caching mechanism
CACHE_FILE = "/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/user_cache.json"

def load_cache():
    if os.path.exists(CACHE_FILE):
        with open(CACHE_FILE, "r") as file:
            try:
                return json.load(file)
            except json.JSONDecodeError:
                return {}
    return {}

def save_cache(cache):
    try:
        with open(CACHE_FILE, "w") as file:
            json.dump(cache, file, indent=4)
    except Exception as e:
        print(f"Error saving cache: {e}")

def update_cache(user_id, recommended, genre):
    cache = load_cache()
    recommended_cleaned = recommended.copy()
    recommended_cleaned['release_date'] = recommended_cleaned['release_date'].astype(str)
    cache[str(user_id)] = {
        "recommendations": recommended_cleaned.to_dict(orient="records"),
        "last_genre": genre
    }
    save_cache(cache)

print("✅ Caching mechanism set up successfully!")

✅ Caching mechanism set up successfully!


# **Training LightFM Model 🎬📊**  

To build a **hybrid recommendation system**, we use the **LightFM** model, which combines **collaborative filtering** and **content-based filtering**.  

---

## **🔹 Training the Model**
1️⃣ **Simulate User-Item Interactions**  
   - Randomly assigning **100 users** with movie interactions.  
   - Each user has rated between **5 to 20 movies** (simulating implicit feedback).  

2️⃣ **Convert Movie IDs to Matrix Indices**  
   - Since LightFM works with **numerical indices**, we map each movie ID to a **unique index**.  

3️⃣ **Create a Sparse Interaction Matrix**  
   - This matrix represents user interactions where:  
     - **Rows → Users**  
     - **Columns → Movies**  
     - **Values → Ratings (1 for interaction, 0 if no interaction)**  

4️⃣ **Initialize & Train the LightFM Model**  
   - **`loss='warp'`** → Uses **Weighted Approximate-Rank Pairwise (WARP)** loss, which prioritizes better ranking over raw ratings.  
   - **`no_components=64`** → The number of latent features (higher values improve accuracy but increase computation).  
   - **`learning_rate=0.05`** → Defines how fast the model updates weights (too high may overshoot, too low may converge slowly).  
   - **`epochs=100`** → The number of training iterations (more epochs allow the model to learn better).  
   - **`num_threads=8`** → Uses **8 CPU threads** for parallel computation, speeding up training.  

✅ **Sigificance of  LightFM?**  
- **Handles Cold Start Problems** → Uses both **collaborative** (user-item interactions) and **content-based** (metadata like genre & cast).  
- **Optimized for Ranking** → The WARP loss function focuses on **ranking recommendations correctly** instead of just predicting ratings.  
- **Scalable & Efficient** → Works well for large datasets with **sparse interactions**.  



In [None]:
# Train LightFM model
user_interactions = [(user_id, movie_id, 1) for user_id in range(100) for movie_id in np.random.choice(data['id'].values, size=np.random.randint(5, 20), replace=False)]
interactions = pd.DataFrame(user_interactions, columns=['user_id', 'item_id', 'rating'])
movie_id_to_index = {mid: idx for idx, mid in enumerate(data['id'])}
data['matrix_index'] = data['id'].map(movie_id_to_index)
interactions['item_id'] = interactions['item_id'].map(movie_id_to_index)
interaction_matrix = csr_matrix((interactions['rating'], (interactions['user_id'], interactions['item_id'])), shape=(100, len(data)))
model = LightFM(loss='warp', no_components=64, learning_rate=0.05)
model.fit(interaction_matrix, epochs=100, num_threads=8)
print("🚀 Model Training Completed Successfully!")

🚀 Model Training Completed Successfully!


# **📌 TF-IDF Vectorization & Cosine Similarity in Movie Recommendations 🎬📊**  

To make **content-based recommendations**, we need a way to measure the **similarity between movies** based on their **genres and cast**. We achieve this using **TF-IDF Vectorization** and **Cosine Similarity**.

TF-IDF (**Term Frequency - Inverse Document Frequency**) is a technique in **Natural Language Processing (NLP)** that helps in text analysis by giving **importance** to unique words while ignoring common words.

Once we convert movies into **TF-IDF vectors**, we measure how **similar** they are using **Cosine Similarity**.





In [None]:
# TF-IDF Vectorization
vectorizer = TfidfVectorizer(stop_words='english')
feature_matrix = vectorizer.fit_transform(data['genres'].apply(lambda x: ' '.join(x)) + " " + data['cast'])
similarity_matrix = cosine_similarity(feature_matrix)


# **🎬 Creating a Movie Recommendation Function: Hybrid Approach 🤖🎯**  

This function **recommends movies to users** based on their **preferences, popularity, and trends**. It smartly combines **content-based filtering, collaborative filtering (LightFM), and trending analysis** to provide **highly relevant suggestions**.

---

## **🔹 How Does It Work?**
1. **🔄 Caching for Faster Results:**  
   - If the user has received recommendations before and no genre is specified, it **retrieves results from cache** for efficiency.  

2. **🧊 Cold Start Problem Handling:**  
   - If the user is new (no prior interactions), the system **recommends trending/popular movies** using:
     - `trending_score`
     - `popularity`  

3. **⚡ LightFM Model for Personalized Recommendations:**  
   - If the user has interaction data, the model **predicts scores for all movies**.
   - It ranks them based on **predicted relevance**.

4. **🎭 Genre-Based Filtering (Optional):**  
   - If the user requests a specific **genre**, the function **filters movies** that belong to that genre.

5. **📈 Sorting by Trending Score (Optional):**  
   - If the `filter_type` is `"trending"`, movies are ranked based on their **trending score** (a weighted metric of popularity, reviews, and release date).  

6. **💾 Updates the Cache for Future Recommendations:**  
   - Saves recommendations to avoid re-computation.  

---
### **1️⃣ Cold Start Problem 🧊**
- When a new user has no history, **personalized recommendations cannot be generated**.
- **Solution:** Recommend trending or popular movies.

### **2️⃣ Hybrid Recommendation System 🔄**
- **Collaborative Filtering (LightFM Model)** → Learns from past user interactions.  
- **Content-Based Filtering (Genres, Cast)** → Matches movies with similar features.  
- **Trending Factor** → Ensures fresh, relevant recommendations.

### **3️⃣ Ranking & Sorting 🏆**
- **Sorting by Scores** → Ensures top recommendations are the best fit.  
- **Filtering by Genre** → Allows **personalized** selections.  
- **Trending Score Factor** → Prioritizes recent and popular movies.  

---



In [None]:
# Recommendation function
def recommend_movies(user_id, genre=None, filter_type="popular", top_n=10):

    # Checks Cached Recommendations

    cache = load_cache()
    if not genre and str(user_id) in cache:
        print("\n🔹 Using Cached Recommendations!")
        return pd.DataFrame(cache[str(user_id)]['recommendations']).head(top_n)

    # Handles Cold Start Users
    # If the user has no earlier interactions, it sorts movies by trending score & popularity and recommends the most popular movies.

    if user_id not in interactions['user_id'].unique():
        print("\n🧊 Cold Start Detected! Using Hybrid Recommendation.")
        cold_start_recommendations = data.sort_values(by=['trending_score', 'popularity'], ascending=False).head(top_n)
        return cold_start_recommendations

    scores = model.predict(user_id, np.arange(len(data)))


    # Filters by Genre (If Specified)

    if genre:
        mask = data['genres'].apply(lambda g: genre.lower() in g)
        scores = np.where(mask, scores, -np.inf)

    #Selects Top-N Recommended Movies

    top_indices = np.argsort(-scores)[:top_n]
    recommended = data.iloc[top_indices]

    if filter_type == "trending":
        recommended = recommended.sort_values(by='trending_score', ascending=False)

    # Updates Cache for Faster Future Recommendations

    update_cache(user_id, recommended, genre)
    return recommended.head(top_n)


 # **API Endpoint for Movie Recommendations**

Lets define an API endpoint (/recommend) that allows users to request movie recommendations based on their user ID, genre preference, filtering type, and number of recommendations.

In [None]:
# API Endpoint for Recommendations
@app.route('/recommend', methods=['GET'])
def recommend():
    user_id = int(request.args.get('user_id'))
    genre = request.args.get('genre', None)
    filter_type = request.args.get('filter_type', 'popular')
    top_n = int(request.args.get('top_n', 10))

    recommendations = recommend_movies(user_id, genre, filter_type, top_n)

    if recommendations.empty:
        return jsonify({"error": "No recommendations found."})

    return jsonify(recommendations.to_dict(orient='records'))


#### **API Overview**
###### Endpoint: /recommend
###### Method: GET
###### Purpose: Provides personalized movie recommendations based on user interactions.

# **📌 Ngrok Integration for Flask App**
Ngrok is used to expose a local Flask application to the internet, making it accessible through a public URL. This is useful for testing APIs, webhooks, and remote access without deploying to a server.

In [None]:
# Authenticate ngrok
ngrok.set_auth_token("2tzXC3T6eY5q7rBkuPPIvUi1Qyd_5zdYcv6D5saSjBEEqtfuj")  # Replace with your actual authtoken

# Run the Flask app with ngrok
public_url = ngrok.connect(5000).public_url
print(f" * Running on {public_url}")

# Run the Flask app
app.run()

 * Running on https://9175-35-196-177-83.ngrok-free.app
 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug:127.0.0.1 - - [17/Mar/2025 12:02:54] "GET /recommend?user_id=1&genre=action&filter_type=popular&top_n=5 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [17/Mar/2025 12:03:05] "GET /recommend?user_id=1&genre=action&filter_type=trending&top_n=5 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [17/Mar/2025 12:03:20] "GET /recommend?user_id=1&genre=action&filter_type=popular&top_n=5 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [17/Mar/2025 12:03:26] "GET /recommend?user_id=1&genre=action&filter_type=trending&top_n=5 HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [17/Mar/2025 12:03:36] "GET /recommend?user_id=1&genre=action&filter_type=&top_n=5 HTTP/1.1" 200 -


In [None]:
# # Install pyngrok
# !pip install pyngrok
# !pip install flask-cors


# # Import libraries
# from flask import Flask, request, jsonify
# import pandas as pd
# import numpy as np
# from lightfm import LightFM
# from sklearn.feature_extraction.text import TfidfVectorizer
# from sklearn.metrics.pairwise import cosine_similarity
# from scipy.sparse import csr_matrix
# import json
# import os
# import ast
# from pyngrok import ngrok
# from flask_cors import CORS  # Import for enabling CORS

# # Initialize Flask app
# app = Flask(__name__)
# CORS(app)  # Enable CORS for all routes

# # Load data and preprocess
# movies = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/tmdb_5000_movies.csv')
# credits = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/tmdb_5000_credits.csv')
# data = movies.merge(credits, left_on='id', right_on='movie_id').drop('movie_id', axis=1)
# data['title'] = data['title_x'].fillna(data['title_y'])
# data.drop(columns=['title_x', 'title_y'], inplace=True)

# # Preprocessing functions
# def parse_genres(genre_str):
#     try:
#         return [g['name'].lower() for g in ast.literal_eval(genre_str)]
#     except Exception as e:
#         print(f"Error parsing genres: {genre_str}, {e}")
#         return ['unknown']

# def parse_cast(cast_str):
#     try:
#         return ', '.join([c['name'] for c in ast.literal_eval(cast_str)[:5]])
#     except Exception as e:
#         print(f"Error parsing cast: {cast_str}, {e}")
#         return 'Unknown'

# data['genres'] = data['genres'].apply(parse_genres)
# data['cast'] = data['cast'].apply(parse_cast)
# data.drop_duplicates(subset=['id'], inplace=True)
# data['release_date'] = pd.to_datetime(data['release_date'], errors='coerce')

# # Trending score calculation
# def calculate_trending_score(row):
#     recentness_factor = 1 if row['release_date'] and row['release_date'].year >= 2020 else 0.5
#     return (row['popularity'] * 0.5) + (row['vote_average'] * 0.3) + (recentness_factor * 0.2)

# data['trending_score'] = data.apply(calculate_trending_score, axis=1)
# print("✅ Trending scores calculated successfully!")

# # Caching mechanism
# CACHE_FILE = "/content/drive/MyDrive/Colab Notebooks/Movie Recommender System/user_cache.json"

# def load_cache():
#     if os.path.exists(CACHE_FILE):
#         with open(CACHE_FILE, "r") as file:
#             try:
#                 return json.load(file)
#             except json.JSONDecodeError:
#                 return {}
#     return {}

# def save_cache(cache):
#     try:
#         with open(CACHE_FILE, "w") as file:
#             json.dump(cache, file, indent=4)
#     except Exception as e:
#         print(f"Error saving cache: {e}")

# def update_cache(user_id, recommended, genre):
#     cache = load_cache()
#     recommended_cleaned = recommended.copy()
#     recommended_cleaned['release_date'] = recommended_cleaned['release_date'].astype(str)
#     cache[str(user_id)] = {
#         "recommendations": recommended_cleaned.to_dict(orient="records"),
#         "last_genre": genre
#     }
#     save_cache(cache)

# print("✅ Caching mechanism set up successfully!")

# # Train LightFM model
# user_interactions = [(user_id, movie_id, 1) for user_id in range(100) for movie_id in np.random.choice(data['id'].values, size=np.random.randint(5, 20), replace=False)]
# interactions = pd.DataFrame(user_interactions, columns=['user_id', 'item_id', 'rating'])
# movie_id_to_index = {mid: idx for idx, mid in enumerate(data['id'])}
# data['matrix_index'] = data['id'].map(movie_id_to_index)
# interactions['item_id'] = interactions['item_id'].map(movie_id_to_index)
# interaction_matrix = csr_matrix((interactions['rating'], (interactions['user_id'], interactions['item_id'])), shape=(100, len(data)))
# model = LightFM(loss='warp', no_components=64, learning_rate=0.05)
# model.fit(interaction_matrix, epochs=100, num_threads=8)

# # TF-IDF Vectorization
# vectorizer = TfidfVectorizer(stop_words='english')
# feature_matrix = vectorizer.fit_transform(data['genres'].apply(lambda x: ' '.join(x)) + " " + data['cast'])
# similarity_matrix = cosine_similarity(feature_matrix)

# # Recommendation function
# def recommend_movies(user_id, genre=None, filter_type="popular", top_n=10):
#     cache = load_cache()
#     if not genre and str(user_id) in cache:
#         print("\n🔹 Using Cached Recommendations!")
#         return pd.DataFrame(cache[str(user_id)]['recommendations']).head(top_n)

#     if user_id not in interactions['user_id'].unique():
#         print("\n🧊 Cold Start Detected! Using Hybrid Recommendation.")
#         cold_start_recommendations = data.sort_values(by=['trending_score', 'popularity'], ascending=False).head(top_n)
#         return cold_start_recommendations

#     scores = model.predict(user_id, np.arange(len(data)))

#     if genre:
#         mask = data['genres'].apply(lambda g: genre.lower() in g)
#         scores = np.where(mask, scores, -np.inf)

#     top_indices = np.argsort(-scores)[:top_n]
#     recommended = data.iloc[top_indices]

#     if filter_type == "trending":
#         recommended = recommended.sort_values(by='trending_score', ascending=False)

#     update_cache(user_id, recommended, genre)
#     return recommended.head(top_n)

# # API Endpoint for Recommendations
# @app.route('/recommend', methods=['GET'])
# def recommend():
#     user_id = int(request.args.get('user_id'))
#     genre = request.args.get('genre', None)
#     filter_type = request.args.get('filter_type', 'popular')
#     top_n = int(request.args.get('top_n', 10))

#     recommendations = recommend_movies(user_id, genre, filter_type, top_n)

#     if recommendations.empty:
#         return jsonify({"error": "No recommendations found."})

#     return jsonify(recommendations.to_dict(orient='records'))

# # Authenticate ngrok
# ngrok.set_auth_token("2tzXC3T6eY5q7rBkuPPIvUi1Qyd_5zdYcv6D5saSjBEEqtfuj")  # Replace with your actual authtoken

# # Run the Flask app with ngrok
# public_url = ngrok.connect(5000).public_url
# print(f" * Running on {public_url}")

# # Run the Flask app
# app.run()
