### Hybrid model

A Hybrid Recommendation System combines two or more recommendation techniques—such as collaborative filtering, content-based filtering, and popularity-based methods—to generate more accurate, personalized, and robust recommendations. Each individual approach has its strengths and limitations. For instance, collaborative filtering can suffer from the cold start problem for new users or items, while content-based methods may be limited in diversity. By blending these approaches, hybrid models aim to overcome these weaknesses and leverage the strengths of each method.

In the context of movie recommendations, a hybrid model might use user-item rating patterns (collaborative filtering) and movie metadata like genres or overviews (content-based filtering) together. For example, it can recommend movies that are both similar in content to a movie the user liked and are also liked by users with similar preferences. This leads to more accurate and relevant recommendations, especially in sparse or noisy datasets.

This hybrid recommendation system effectively combines **content-based filtering** and **collaborative filtering** to deliver personalized and accurate movie recommendations.

#####  **Step 1: Content-Based Filtering**

- **TF-IDF Vectorization**  
  Transforms movie genres into numerical vectors capturing meaningful genre combinations.

- **K-Nearest Neighbors (KNN)**  
  Uses cosine similarity to identify genre-similar movies based on the user's selected movie.

##### **Step 2: Collaborative Filtering**

- **Singular Value Decomposition (SVD)**  
  - Implemented using the `surprise` library.
  - SVD is trained on the userId, movieId, rating data and learns latent features (patterns) of user preferences.
  - For each content-similar movie found in step 1, the model predicts how much the input user would like that movie using SVD.

#####  **Step 3: Hybrid Scoring**

The system combines both content and collaborative filtering using a hybrid score:

hybrid_score = (vote_average + svd_score) / 2

* vote_average captures how good the movie is generally (public opinion).

* svd_score captures how likely the specific user is to enjoy the movie.

Movies are then sorted by this hybrid score, and the top-N are returned as recommendations.


### Model Performance

Accuracy
- What it means: How often the model's predictions (good/bad movie) match the actual rating.

Precision
- What it means: Of all the movies predicted as "liked", how many were actually liked?

Recall
- What it means: Of all the movies the user actually liked, how many did the model correctly identify?

F1 Score
- What it means: The harmonic mean of precision and recall — balances both.

RMSE (Root Mean Squared Error)
- What it means: Measures how far the predicted ratings are from the actual ratings — penalizes larger errors more heavily.

MAE (Mean Absolute Error)
- What it means: The average of how far off each prediction is from the actual rating.

In [4]:
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, accuracy_score, precision_score, recall_score, f1_score

df = pd.read_csv("cleaned_movie_ratings.csv").dropna(subset=["userId", "movieId", "rating", "vote_average"])

# Prepare Surprise dataset
reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

# Train-test split
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)
svd = SVD()
svd.fit(trainset)

# Predict and merge with content-based scores
predictions = svd.test(testset)
pred_df = pd.DataFrame([(p.uid, p.iid, p.r_ui, p.est) for p in predictions], columns=["userId", "movieId", "actual", "svd_pred"])
vote_avg = df.drop_duplicates("movieId").set_index("movieId")["vote_average"]
pred_df["vote_average"] = pred_df["movieId"].map(vote_avg)
pred_df.dropna(inplace=True)
pred_df["hybrid_pred"] = (pred_df["svd_pred"] + pred_df["vote_average"]) / 2

# Evaluation
y_true = pred_df["actual"]
y_pred = pred_df["hybrid_pred"]
y_true_bin = (y_true >= 3.5).astype(int)
y_pred_bin = (y_pred >= 3.5).astype(int)

print("📊 Hybrid Model Evaluation")
print("RMSE:", round(mean_squared_error(y_true, y_pred, squared=False), 4))
print("MAE:", round(mean_absolute_error(y_true, y_pred), 4))
print("Accuracy:", round(accuracy_score(y_true_bin, y_pred_bin), 4))
print("Precision:", round(precision_score(y_true_bin, y_pred_bin), 4))
print("Recall:", round(recall_score(y_true_bin, y_pred_bin), 4))
print("F1 Score:", round(f1_score(y_true_bin, y_pred_bin), 4))


📊 Hybrid Model Evaluation
RMSE: 1.8794
MAE: 1.5973
Accuracy: 0.614
Precision: 0.6169
Recall: 0.9854
F1 Score: 0.7588




##### **Model Performance Metrics**

| Metric     | Value   | Meaning |
|------------|---------|---------|
| **RMSE**       | `1.8803` | On average, predictions deviate from actual ratings by approximately **1.88 points**. Lower is better. |
| **MAE**        | `1.5985` | Average absolute error between predicted and actual ratings is approximately **1.60**. Lower is better. |
| **Accuracy**   | `0.6140` | **61.4%** of predictions correctly classified movies as "liked" or "not liked." |
| **Precision**  | `0.6169` | About **61.7%** of movies predicted as liked were actually liked by users. |
| **Recall**     | `0.9854` | The model captured almost all (**98.5%**) of the movies the user liked, indicating high recall. |
| **F1 Score**   | `0.7588` | Represents a good balance between precision and recall, indicating overall robust recommendation quality. |


### **Hybrid Movie Recommendation App (Streamlit UI)**

This code represents the **user interface component** of a Streamlit-based **hybrid movie recommendation system**, designed to deliver personalized movie suggestions interactively.

##### **How it works:**

1. **Select Movie Title**
   - Users interact with a searchable dropdown containing movie titles.
   - Allows easy selection of a film of interest.

2. **Enter User ID**
   - Users input their unique User ID.
   - Essential for personalizing recommendations via collaborative filtering (**SVD model**).

3. **Choose Number of Recommendations**
   - A slider enables users to select how many recommendations they wish to receive (**between 5 and 50**).

4. **Generate Recommendations**
   - Clicking the **"Recommend"** button initiates:
     - **Content-based filtering** (genre similarity).
     - **Collaborative filtering** (SVD-predicted ratings).
   - Combines both methods to create a ranked list of recommended movies.

5. **Display Results**
   - Recommendations appear in a clear table with columns:
     - **Title**
     - **Genres**
     - **Vote Average** (general audience preference)
     - **Predicted SVD Score** (personalized rating prediction)
     - **Final Hybrid Score** (average of vote average and SVD score)

6. **Handling No Recommendations**
   - If no suitable recommendations are found, a user-friendly warning message is displayed.




In [7]:
with open("app4.py", "w") as f:
    f.write('''import streamlit as st
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import NearestNeighbors
from surprise import Dataset, Reader, SVD

# Load cleaned data
df = pd.read_csv("cleaned_movie_ratings.csv")
df['genres'] = df['genres'].fillna('')

# --- Collaborative Filtering Setup (SVD) ---
reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)
trainset = data.build_full_trainset()
svd_model = SVD()
svd_model.fit(trainset)

# --- Content-Based Filtering Setup (KNN on genres) ---
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(df['genres'])
knn_model = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=200)
knn_model.fit(tfidf_matrix)

# --- Mappings ---
title_to_index = pd.Series(df.index, index=df['title']).drop_duplicates()

# --- Hybrid Recommendation Function ---
def recommend_hybrid(movie_title, user_id, n=10):
    if movie_title not in title_to_index:
        return pd.DataFrame({"Error": [f"Movie '{movie_title}' not found in dataset."]})

    idx = title_to_index[movie_title]
    distances, indices = knn_model.kneighbors(tfidf_matrix[idx], n_neighbors=200)
    movie_indices = indices[0][1:]

    recommendations = df.iloc[movie_indices][["movieId", "title", "vote_average", "genres"]]
    recommendations = recommendations.drop_duplicates(subset="title")  # ensure unique titles

    # Add collaborative SVD predictions
    recommendations["svd_score"] = recommendations["movieId"].apply(lambda x: svd_model.predict(user_id, x).est)

    # Combine to hybrid score
    recommendations["hybrid_score"] = (recommendations["vote_average"] + recommendations["svd_score"]) / 2

    return recommendations.sort_values(by="hybrid_score", ascending=False).head(n)

# ------------------ Streamlit UI ------------------
st.set_page_config(page_title="🎬 Hybrid Recommender", layout="centered")
st.title("🎬 Hybrid Movie Recommendation System")
st.markdown("This system combines **Content-Based Filtering (Genres)** and **Collaborative Filtering (SVD)** to deliver personalized movie recommendations.")

# 🎞️ Search and Select Movie
movie_list = sorted(df['title'].dropna().unique())
movie_input = st.selectbox("🎞️ Search and Select a Movie:", movie_list)

# 👤 User ID Input
user_id_input = st.number_input("👤 Enter User ID:", min_value=1, step=1)

# 🔢 Number of Recommendations
top_n = st.slider("📌 Number of Recommendations:", 5, 50, 10)

# 🎯 Show Recommendations
if st.button("🎯 Recommend"):
    results = recommend_hybrid(movie_input, user_id_input, n=top_n)
    if not results.empty:
        st.success(f"✅ Top {top_n} Hybrid Recommendations for User {user_id_input} Based on '{movie_input}':")
        st.dataframe(results[["title", "genres", "vote_average", "svd_score", "hybrid_score"]])
    else:
        st.warning("⚠️ No recommendations found. Try another movie or user ID.")

''')

print("app4.py file created successfully.")

app4.py file created successfully.


In [None]:
!streamlit run app4.py

[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8503[0m
[34m  Network URL: [0m[1mhttp://192.168.1.199:8503[0m
[0m


###  **Model Comparison: Collaborative (SVD) vs. Hybrid Model**

| Metric       | Collaborative (SVD) | Hybrid Model | What It Tells Us |
|--------------|---------------------|--------------|------------------|
| **RMSE**     | `0.9018` ✅          | `1.8803`     | **SVD** predicts ratings closer to actual values. |
| **MAE**      | `0.6914` ✅          | `1.5985`     | **SVD** has smaller average prediction errors. |
| **Accuracy** | `0.7035` ✅          | `0.6140`     | **SVD** correctly classifies more predictions overall. |
| **Precision**| `0.7858` ✅          | `0.6169`     | **SVD** recommendations are more relevant (movies predicted as liked are indeed liked). |
| **Recall**   | `0.7132`            | `0.9854` ✅   | **Hybrid** captures a greater proportion of movies users actually liked. |
| **F1 Score** | `0.7477`            | `0.7588` ✅   | **Hybrid** slightly edges out in balancing precision and recall. |


#### **Best-Performing Model: Collaborative Filtering (SVD)**

Based on evaluating multiple recommendation approaches, the **Collaborative Filtering (SVD)** model demonstrates the strongest performance, particularly in terms of **Root Mean Squared Error (RMSE)**.

- **RMSE**: The SVD model has the **lowest RMSE** of **0.9018**, indicating the most accurate predictions of user ratings compared to both Content-Based and Hybrid models.
- **Accuracy and Precision**: The lower RMSE reflects the SVD model’s ability to accurately capture user preferences, leading to more reliable and relevant recommendations.
- **Comparison to Hybrid Model**: While the Hybrid model excels in recall, suggesting broader coverage of potentially relevant items, the Collaborative Filtering (SVD) model achieves the best balance between precision and predictive accuracy.

✅ **Conclusion**:  
The **Collaborative Filtering (SVD)** approach provides the most robust and accurate recommendation performance, making it the ideal choice for this recommendation system.

