###  **Collaborative Filtering**

**Collaborative filtering** is a recommendation technique that suggests movies to users based on the preferences of other users with similar tastes.

- It identifies **patterns in user ratings** and predicts how much a user might like a movie they haven't seen yet.
- Unlike content-based filtering, it **does not rely on movie content** (e.g., genre or description).
- Instead, it leverages the **behavior and ratings of many users** to generate personalized recommendations.

It assumes that if two users liked the same movies in the past, they will likely enjoy similar movies in the future. The system looks at the behavior patterns of users, such as ratings.
* It finds similar users or items based on those patterns.
* If User A and User B rate several movies similarly, then User A might like movies that User B liked but User A hasn’t seen yet.



### **Collaborative Filtering with Singular Value Decomposition (SVD)**

This collaborative filtering system is built using **Singular Value Decomposition (SVD)**, a model-based approach to recommending movies based on user preferences.

- The system loads a cleaned dataset of user ratings and prepares it using the **Surprise library**.
- The **SVD model** is trained on a subset of the data to learn latent features that capture relationships between users and movies.
- It **predicts** how a user might rate unseen movies by identifying patterns from other users with similar tastes.
- The app then recommends the **top N movies** for a selected user by sorting predicted ratings in descending order.



In [4]:
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split

df = pd.read_csv("cleaned_movie_ratings.csv") 


reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)

# Split data into train and test sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)


In [5]:
# Initialize and train the model
svd = SVD()
svd.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x120d7e840>

In [6]:
# Predict a specific user-movie rating
user_id = 1
movie_id = 120 
prediction = svd.predict(user_id, movie_id)
print(f"Predicted rating for User {user_id} on Movie {movie_id}: {prediction.est:.2f}")


Predicted rating for User 1 on Movie 120: 3.11


In [7]:
def get_top_n_recommendations(user_id, df, model, n=10):
    rated_movies = df[df['userId'] == user_id]['movieId'].tolist()
    all_movies = df['movieId'].unique()
    to_predict = [m for m in all_movies if m not in rated_movies]
    
    predictions = [model.predict(user_id, movie_id) for movie_id in to_predict]
    top_preds = sorted(predictions, key=lambda x: x.est, reverse=True)[:n]
    
    movie_map = df[['movieId', 'title']].drop_duplicates().set_index('movieId')['title'].to_dict()
    return [(movie_map.get(pred.iid, 'Unknown'), round(pred.est, 2)) for pred in top_preds]

top_recs = get_top_n_recommendations(user_id=1, df=df, model=svd, n=10)
for title, score in top_recs:
    print(f"{title}: {score}")


The Million Dollar Hotel: 4.22
Once Were Warriors: 4.07
Rome, Open City: 4.04
While You Were Sleeping: 4.0
Dog Day Afternoon: 3.97
Space Jam: 3.95
The Thomas Crown Affair: 3.95
Galaxy Quest: 3.95
Cousin, Cousine: 3.94
Dead Man: 3.93


### Model Performance 

In [9]:
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, mean_squared_error, mean_absolute_error


# Get predictions on test set
predictions = svd.test(testset)

# Actual and predicted ratings
y_true = [pred.r_ui for pred in predictions]
y_pred = [pred.est for pred in predictions]

# Regression metrics
rmse = mean_squared_error(y_true, y_pred, squared=False)
mae = mean_absolute_error(y_true, y_pred)

# Convert to binary labels (e.g., relevant = rating ≥ 3.5)
threshold = 3.5
y_true_bin = [1 if r >= threshold else 0 for r in y_true]
y_pred_bin = [1 if r >= threshold else 0 for r in y_pred]

# Classification metrics
accuracy_val = accuracy_score(y_true_bin, y_pred_bin)
precision_val = precision_score(y_true_bin, y_pred_bin)
recall_val = recall_score(y_true_bin, y_pred_bin)
f1_val = f1_score(y_true_bin, y_pred_bin)


print("\n Evaluation Metrics (on Test Set):")
print(f"RMSE      : {rmse:.4f}")
print(f"MAE       : {mae:.4f}")
print(f"Accuracy  : {accuracy_val:.4f}")
print(f"Precision : {precision_val:.4f}")
print(f"Recall    : {recall_val:.4f}")
print(f"F1 Score  : {f1_val:.4f}")



 Evaluation Metrics (on Test Set):
RMSE      : 0.9037
MAE       : 0.6934
Accuracy  : 0.6976
Precision : 0.7779
Recall    : 0.7127
F1 Score  : 0.7439




##### **Model Performance Metrics**

| Metric     | Value   | What It Means |
|------------|---------|---------------|
| **RMSE**       | `0.9018` | On average, predicted ratings differ from true ratings by approximately **0.90 points**. Lower is better. |
| **MAE**        | `0.6914` | The average absolute difference between predicted and actual ratings is approximately **0.69**. |
| **Accuracy**   | `0.7035` | About **70.35%** of the model’s binary predictions (liked or not liked) were correct. |
| **Precision**  | `0.7858` | Approximately **78.58%** of movies predicted as “liked” were actually liked. |
| **Recall**     | `0.7132` | The model successfully identified about **71.32%** of the movies the user actually liked. |
| **F1 Score**   | `0.7477` | A balanced average of precision and recall, indicating solid overall recommendation performance. |

### Movie Recommendation Webpage using Collaborative based Filtering

This project implements a **collaborative filtering-based movie recommendation system** using **Singular Value Decomposition (SVD)**. It leverages user rating data to learn patterns and suggest movies that a user is likely to enjoy, based on the preferences of similar users.

The application is built with **Streamlit**, providing an intuitive and interactive web interface. Users can:

- **Input their User ID** to receive personalized movie recommendations.
- **Predict ratings** for specific movies.

#### **How it Works:**

1. **Data Preparation**:  
   Loads and preprocesses the ratings dataset.

2. **Model Training (SVD)**:  
   Captures latent features of users and movies from user-item interactions.

3. **Personalized Recommendations**:  
   Identifies and recommends top-rated unseen movies for each user.

4. **Rating Prediction**:  
   Predicts how a specific user might rate an individual movie.

✅ **This approach** focuses entirely on user behavior and rating patterns, without needing metadata like genres or plot summaries. It’s ideal for personalization and demonstrates the effectiveness of collaborative filtering in real-world recommendation engines.


In [12]:
with open("app1.py", "w") as f:
    f.write('''import streamlit as st
import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split

# --- Load the dataset ---
@st.cache_data
def load_data():
    return pd.read_csv("cleaned_movie_ratings.csv")

df = load_data()

# --- Prepare Surprise data and train SVD (cached) ---
@st.cache_resource
def train_model(dataframe):
    reader = Reader(rating_scale=(0.5, 5.0))
    data = Dataset.load_from_df(dataframe[['userId', 'movieId', 'rating']], reader)
    trainset, _ = train_test_split(data, test_size=0.2, random_state=42)
    model = SVD()
    model.fit(trainset)
    return model

svd = train_model(df)

# --- Create mapping for movie titles ---
movie_map = df[['movieId', 'title']].drop_duplicates().set_index('movieId')['title'].to_dict()
title_map = df[['title', 'movieId']].drop_duplicates().set_index('title')['movieId'].to_dict()

# --- Recommendation Function ---
def get_top_n_recommendations(user_id, df, model, n=10):
    rated_movies = df[df['userId'] == user_id]['movieId'].tolist()
    all_movies = df['movieId'].unique()
    to_predict = [m for m in all_movies if m not in rated_movies]

    predictions = [model.predict(user_id, movie_id) for movie_id in to_predict]
    top_preds = sorted(predictions, key=lambda x: x.est, reverse=True)[:n]

    return [(movie_map.get(pred.iid, 'Unknown'), round(pred.est, 2)) for pred in top_preds]

# --- Streamlit UI ---
st.title("🎬 Collaborative Filtering (SVD) Recommender")

user_id_input = st.number_input("Enter User ID:", min_value=1, step=1)
num_recs = st.slider("How many recommendations?", 5, 20, 10)

if st.button("Get Recommendations"):
    top_recs = get_top_n_recommendations(user_id=user_id_input, df=df, model=svd, n=num_recs)
    if top_recs:
        st.subheader(f"📌 Top {num_recs} Recommendations for User {user_id_input}")
        st.table(pd.DataFrame(top_recs, columns=["Movie Title", "Predicted Rating"]))
    else:
        st.warning("No recommendations found for this user.")

# Optional: Predict rating for a specific movie
st.markdown("---")
st.subheader("🎯 Predict Rating for a Specific Movie")
movie_title = st.selectbox("Choose a movie:", sorted(title_map.keys()))

if st.button("Predict Rating"):
    movie_id = title_map.get(movie_title)
    pred = svd.predict(user_id_input, movie_id)
    st.success(f"Predicted Rating for User {user_id_input} on '{movie_title}': {pred.est:.2f}")


''')

print("app1.py file created successfully.")

app1.py file created successfully.


In [None]:
!streamlit run app1.py

[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8502[0m
[34m  Network URL: [0m[1mhttp://10.12.181.173:8502[0m
[0m


####  **Limitations of Collaborative Filtering**

##### **Cold Start Problem**
- Collaborative filtering struggles with **new users** or **new movies** that have **no prior ratings**.
- Without sufficient historical data, the model cannot make accurate predictions or recommendations.

##### **Solutions to the Cold Start Problem:**

1. **Hybrid Recommendation Systems**:
   - Combine collaborative filtering with content-based filtering.
   - Leverage movie metadata (genres, descriptions, keywords) to generate recommendations when rating data is unavailable.

2. **Content-based User Onboarding**:
   - Ask new users to select preferred genres or rate a few initial movies.
   - Quickly establish an initial user preference profile.

3. **Popularity-based Recommendations**:
   - Temporarily suggest trending or top-rated movies to new users until enough interaction data is collected.