

# 🤝 **Collaborative Filtering (CF)**



## 🔥 **What is Collaborative Filtering?**

- Collaborative Filtering is a **smart algorithm** that recommends items by looking at the preferences of users who are similar to you. 
- It’s like a buddy system — if people who liked what you liked also enjoyed other things, those get recommended to you! 🎯💡
---


## 🛠️ **Types of Collaborative Filtering**

| 👤 **User-Based CF**                        | 🎁 **Item-Based CF**                            |
| ------------------------------------------- | ----------------------------------------------- |
| Finds people just like you!                 | Finds items just like your favorites!           |
| Recommends what your buddies liked          | Suggests items similar to what you already love |
| *“People who share your taste also liked…”* | *“If you liked this, try this!”*                |



## 🧮 **How Collaborative Filtering Works**

1️⃣ Collect data: Ratings, clicks, purchases — everything you do counts! 📊

2️⃣ Find similarities: Using fancy math like Cosine similarity & Pearson correlation 🔄

3️⃣ Predict your taste: Guess what you’d love based on your taste twins or similar items 🔮

4️⃣ Serve recommendations: Top picks just for YOU! 🎁🎉



## 🎯 **Key Concepts & Terms**

* **User-Item Matrix:** A big table showing user ratings for items (many missing values). 🗃️
* **Similarity Scores:** Numbers between 0 and 1 (or -1 to 1) that show how close users or items are.
* **Sparsity:** Most users rate only a few items, so the matrix is mostly empty. This can make finding similarities harder. 😕
* **Cold Start Problem:** Difficulty recommending for brand-new users or items with no history. ❄️
* **Scalability:** Handling millions of users and items efficiently is a big challenge! 🚀

## 🎯 **Why Everyone Loves Collaborative Filtering**

* 🎨 *Highly Personalized*: It’s made just for your unique style!
* 🕵️‍♂️ *No need for item details*: Works with pure user behavior!
* 🌍 *Proven & popular*: Netflix, Amazon, Spotify all swear by it!



## 🚧 **Watch Out! Challenges to Know**

| ⚠️ **Challenge**   | 🚩 **What’s the Issue?**                                |
| ------------------ | ------------------------------------------------------- |
| ❄️ Cold Start      | New users/items? No data = no recommendations yet!      |
| 🕳️ Data Sparsity  | Most users rate few items, so finding matches is tricky |
| 🐢 Scalability     | Millions of users & items can slow things down!         |
| 🌟 Popularity Bias | Tends to push already popular items more often          |



## 🛒 **Real-World Examples**

* 🎬 Netflix suggesting movies based on what similar viewers watched
* 🛍️ Amazon recommending products bought by customers with similar tastes
* 🎵 Spotify creating personalized playlists by analyzing what others like you listen to



✨ **Summary:**
 Collaborative Filtering is like getting recommendations from friends who have tastes like yours, helping you discover things you’ll love! It’s powerful, popular, and personal — but needs enough data to work best.





---

# 🤝✨ **Collaborative Filtering METHODS** ✨🤝



### 📊 **Dataset Example (User-Item Ratings Matrix)**

| 👤 **User \ Item** | 🎬 Movie A     | 🎵 Song B      | 📚 Book C      | 🛍️ Product D  | 🎮 Game E      |
| ------------------ | -------------- | -------------- | -------------- | -------------- | -------------- |
| **User 1**         | ⭐️⭐️⭐️⭐️⭐️ (5) | ⭐️⭐️⭐️ (3)     | ⭐️⭐️⭐️⭐️ (4)   | ⭐️⭐️⭐️⭐️ (4)   | ❌ (No rating)  |
| **User 2**         | ⭐️⭐️⭐️ (3)     | ⭐️ (1)         | ⭐️⭐️ (2)       | ⭐️⭐️⭐️ (3)     | ⭐️⭐️⭐️ (3)     |
| **User 3**         | ⭐️⭐️⭐️⭐️ (4)   | ⭐️⭐️⭐️ (3)     | ⭐️⭐️⭐️⭐️ (4)   | ⭐️⭐️⭐️ (3)     | ⭐️⭐️⭐️⭐️⭐️ (5) |
| **User 4**         | ⭐️⭐️⭐️ (3)     | ⭐️⭐️⭐️ (3)     | ❌ (No rating)  | ⭐️⭐️⭐️⭐️⭐️ (5) | ⭐️⭐️⭐️⭐️ (4)   |
| **User 5**         | ⭐️ (1)         | ⭐️⭐️⭐️⭐️⭐️ (5) | ⭐️⭐️⭐️⭐️⭐️ (5) | ⭐️⭐️ (2)       | ⭐️ (1)         |

---

## 🧠 **Collaborative Filtering Explained in Simple Words:**


### 1️⃣ **User-Based Collaborative Filtering (User se User tak)**

* **Idea:** "Jo users tumhari tarah cheezein pasand karte hain, unki pasand tumhare liye suggest karta hoon."
* System un users ko dhundhta hai jinke ratings tumhare jaise hain.
* Fir un users ke pasandida items tumhe recommend karta hai.

**Example:**
Agar tum aur User 3 dono ne Movie A aur Book C ko high rating di, aur User 3 ne Game E ko bhi pasand kiya, toh system tumhe Game E recommend karega. 🎯



### 2️⃣ **Item-Based Collaborative Filtering (Item se Item tak)**

* **Idea:** "Agar tumhe koi item pasand hai, toh uss item jaise aur items bhi suggest karta hoon."
* System items ki similarity check karta hai — matlab jo items mostly ek jaise log pasand karte hain.
* Tumhare pasandida item ke jaise dusre items recommend karta hai.

**Example:**
Agar Movie A aur Product D dono ko users ne similar ratings di, aur tumne Movie A ko pasand kiya, toh Product D tumhare liye ek acha suggestion hoga. 🎬➡️🛍️



## 🔑 **Difference in Simple Words**

| 👥 **User-Based**                                | 🎯 **Item-Based**                                        |
| ------------------------------------------------ | -------------------------------------------------------- |
| Users ke similar taste pe based                  | Items ke similarity pe based                             |
| Users ki pasand jaisi cheezein suggest karta hai | Tumhare pasand ki cheezein jaisi items suggest karta hai |


## 📋 **Why Dataset Format is Important?**

* Rows mein **Users** hote hain
* Columns mein **Items** hote hain
* Har cell mein **Rating** hota hai (1-5 stars)
* Agar user ne item ko rate nahi kiya toh **NaN / blank** hota hai



## 🚀 **Use of Data in Both Methods:**

| Method        | Data Usage                                                                     |
| ------------- | ------------------------------------------------------------------------------ |
| User-Based CF | Find similar users based on ratings, recommend unke pasand ke items            |
| Item-Based CF | Find similar items based on ratings, recommend similar items to tumhari pasand |



## 🎉 **Summary in 2 Lines:**

* **User-Based:** "Users jaise tum, wo kya pasand karte hain, wahi tumhe recommend karta hoon."
* **Item-Based:** "Tum jis cheez ko pasand karte ho, uske jaise cheezein tumhe recommend karta hoon."



---
---
## **USER BASED - COLLABORATIVE FILTERING**
---
---

In [66]:
# LOADING DATA :
import pandas as pd
DATA=pd.read_csv(r"C:\Users\Nagesh Agrawal\OneDrive\Desktop\6_MACHINE LEARNING\1_DATASETS\COLLABORATIVE FILTERING.csv")
DATA

Unnamed: 0,USER_ID1,Drishyam,Queen,Andhadhun,Article 15,Zindagi Na Milegi Dobara,KGF,Raazi,Barfi!,Tanhaji,...,RRR,Sita Ramam,Kantara,Uri: The Surgical Strike,A Wednesday,Kahaani,The Lunchbox,Piku,Pink,Sardar Udham
0,NageshAgrawal,2.0,4.0,5.0,5.0,5.0,4.0,4.0,5.0,2.0,...,2.0,0.0,0.0,1.0,5.0,0.0,2.0,4.0,0.0,1.0
1,UjwalAgrawal,4.0,2.0,0.0,0.0,0.0,3.0,5.0,5.0,3.0,...,1.0,4.0,2.0,1.0,1.0,4.0,3.0,5.0,1.0,3.0
2,AdvayeAgrawal,4.0,5.0,3.0,4.0,5.0,4.0,2.0,1.0,5.0,...,5.0,4.0,0.0,5.0,0.0,3.0,5.0,3.0,3.0,4.0
3,SaurabhAgrawal,5.0,0.0,5.0,3.0,3.0,5.0,2.0,1.0,3.0,...,2.0,2.0,1.0,1.0,0.0,3.0,3.0,4.0,2.0,0.0
4,KaushalJangid,5.0,1.0,3.0,5.0,0.0,3.0,4.0,2.0,0.0,...,0.0,4.0,4.0,4.0,2.0,2.0,2.0,2.0,5.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,RahulPatel,1.0,5.0,1.0,3.0,4.0,1.0,0.0,2.0,3.0,...,2.0,1.0,5.0,0.0,2.0,4.0,5.0,5.0,3.0,3.0
96,NehaSharma,3.0,5.0,5.0,2.0,3.0,0.0,5.0,0.0,5.0,...,4.0,4.0,1.0,4.0,5.0,2.0,5.0,4.0,0.0,1.0
97,VijayJoshi,2.0,4.0,2.0,5.0,5.0,1.0,4.0,4.0,3.0,...,5.0,4.0,5.0,3.0,0.0,4.0,5.0,5.0,2.0,5.0
98,PallaviKapoor,2.0,3.0,3.0,0.0,5.0,5.0,5.0,4.0,0.0,...,3.0,2.0,5.0,5.0,3.0,1.0,5.0,1.0,5.0,0.0


In [67]:
# SET INDEX USER ID COLUMN :
USER_BASED_DATA = DATA.set_index('USER_ID1')

In [68]:
USER_BASED_DATA

Unnamed: 0_level_0,Drishyam,Queen,Andhadhun,Article 15,Zindagi Na Milegi Dobara,KGF,Raazi,Barfi!,Tanhaji,Udaan,RRR,Sita Ramam,Kantara,Uri: The Surgical Strike,A Wednesday,Kahaani,The Lunchbox,Piku,Pink,Sardar Udham
USER_ID1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
NageshAgrawal,2.0,4.0,5.0,5.0,5.0,4.0,4.0,5.0,2.0,3.0,2.0,0.0,0.0,1.0,5.0,0.0,2.0,4.0,0.0,1.0
UjwalAgrawal,4.0,2.0,0.0,0.0,0.0,3.0,5.0,5.0,3.0,5.0,1.0,4.0,2.0,1.0,1.0,4.0,3.0,5.0,1.0,3.0
AdvayeAgrawal,4.0,5.0,3.0,4.0,5.0,4.0,2.0,1.0,5.0,1.0,5.0,4.0,0.0,5.0,0.0,3.0,5.0,3.0,3.0,4.0
SaurabhAgrawal,5.0,0.0,5.0,3.0,3.0,5.0,2.0,1.0,3.0,2.0,2.0,2.0,1.0,1.0,0.0,3.0,3.0,4.0,2.0,0.0
KaushalJangid,5.0,1.0,3.0,5.0,0.0,3.0,4.0,2.0,0.0,2.0,0.0,4.0,4.0,4.0,2.0,2.0,2.0,2.0,5.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
RahulPatel,1.0,5.0,1.0,3.0,4.0,1.0,0.0,2.0,3.0,3.0,2.0,1.0,5.0,0.0,2.0,4.0,5.0,5.0,3.0,3.0
NehaSharma,3.0,5.0,5.0,2.0,3.0,0.0,5.0,0.0,5.0,3.0,4.0,4.0,1.0,4.0,5.0,2.0,5.0,4.0,0.0,1.0
VijayJoshi,2.0,4.0,2.0,5.0,5.0,1.0,4.0,4.0,3.0,0.0,5.0,4.0,5.0,3.0,0.0,4.0,5.0,5.0,2.0,5.0
PallaviKapoor,2.0,3.0,3.0,0.0,5.0,5.0,5.0,4.0,0.0,3.0,3.0,2.0,5.0,5.0,3.0,1.0,5.0,1.0,5.0,0.0


In [69]:
from sklearn.metrics.pairwise import cosine_similarity
USER_SIMILARITY = pd.DataFrame( cosine_similarity( USER_BASED_DATA ), index=USER_BASED_DATA.index, columns=USER_BASED_DATA.index )

In [70]:
user = 'NageshAgrawal'
similar_users = USER_SIMILARITY[user].sort_values(ascending=False)[1:4]  # top 3 excluding self
similar_users

USER_ID1
AnitaKapoor    0.860792
NeetaGupta     0.860715
AnuSharma      0.841929
Name: NageshAgrawal, dtype: float64

---
---
## **CONTENT BASED - COLLABORATORY FILTERING.**

---
---


### 🔹 **Kya hota hai Item-Based Collaborative Filtering?**

Ismein **items (jaise movies)** ke beech ka similarity dekha jaata hai — agar kisi user ko kuch movies achhi lagi hain, toh **unhi jaise similar movies** recommend ki jaati hain.

---

## 🔧 **Tera Code Ka Breakdown**


### ✅ Step 1: `USER_ID1` ko index bana diya

```python
CONTENT_BASED_DATA = DATA.set_index('USER_ID1')
```

Ab:

* Rows → Users
* Columns → Movies
* Values → Ratings (0 to 5)



### ✅ Step 2: Transpose kar diya

```python
ITEM_MATRIX = CONTENT_BASED_DATA.T
```

Ab:

* Rows → Movies (items)
* Columns → Users

> Kyunki hum items ke beech similarity chahte hain, isliye transpose kiya.



### ✅ Step 3: Cosine Similarity calculate ki

```python
from sklearn.metrics.pairwise import cosine_similarity
ITEM_SIMILARITY = pd.DataFrame(cosine_similarity(ITEM_MATRIX), index=ITEM_MATRIX.index, columns=ITEM_MATRIX.index)
```

Ye ek matrix deta hai jisme:

* Har movie ka doosri movie ke saath similarity hoti hai (0 to 1 ke beech).



### ✅ Step 4: Target User select kiya

```python
TARGET_USER = 'RahulSharma'
USER_RATING = CONTENT_BASED_DATA.loc[TARGET_USER]
```

Ab hum sirf **RahulSharma** ke ratings dekh rahe hain.



### ✅ Step 5: Weighted Score nikaala

```python
WEIGHTED_SCORE = ITEM_SIMILARITY.dot(USER_RATING)
```

Har movie ke liye weighted score banaya — jitni similar movies user ne rate ki hain, unka weighted sum.



### ✅ Step 6: Normalize kiya

```python
SIMILARITY_SUM = ITEM_SIMILARITY[USER_RATING.index].sum(axis=1)
RECOMMENDATION_SCORE = WEIGHTED_SCORE / SIMILARITY_SUM
```

Normalize kiya taaki fair average nikaal sake.



### ✅ Step 7: Already rated movies hata di

```python
RECOMMENDATION_SCORE = RECOMMENDATION_SCORE[~USER_RATING.notna()]
```

User ne jo movies pehle se rate ki hain, wo recommend nahi karte.



### ✅ Step 8: Top 5 recommendations print ki

```python
TOP_RECOMMENDATION = RECOMMENDATION_SCORE.sort_values(ascending=False).head(5)
print("Top recommendations for", TARGET_USER)
print(TOP_RECOMMENDATION)
```



## 🎯 Output ka Matlab Kya Hoga?

Maan le output ye aaya:

```
Top recommendations for RahulSharma
Queen         4.5
Drishyam      4.3
Article 15    4.0
...
```

Iska matlab:

* RahulSharma ko **Queen**, **Drishyam** etc. recommend ho rahi hain.
* Kyunki unki similarity un movies se hai jo Rahul ne already high rating di thi (e.g., Andhadhun = 5).



## 🧠 Simple Summary:

1. Har movie ka similarity nikaala.
2. Dekha RahulSharma ne kya-kya rate kiya.
3. Un ratings se baaki movies ka score calculate kiya.
4. Jo pehle nahi dekhi, unmein se top 5 recommend ki.




In [71]:
DATA

Unnamed: 0,USER_ID1,Drishyam,Queen,Andhadhun,Article 15,Zindagi Na Milegi Dobara,KGF,Raazi,Barfi!,Tanhaji,...,RRR,Sita Ramam,Kantara,Uri: The Surgical Strike,A Wednesday,Kahaani,The Lunchbox,Piku,Pink,Sardar Udham
0,NageshAgrawal,2.0,4.0,5.0,5.0,5.0,4.0,4.0,5.0,2.0,...,2.0,0.0,0.0,1.0,5.0,0.0,2.0,4.0,0.0,1.0
1,UjwalAgrawal,4.0,2.0,0.0,0.0,0.0,3.0,5.0,5.0,3.0,...,1.0,4.0,2.0,1.0,1.0,4.0,3.0,5.0,1.0,3.0
2,AdvayeAgrawal,4.0,5.0,3.0,4.0,5.0,4.0,2.0,1.0,5.0,...,5.0,4.0,0.0,5.0,0.0,3.0,5.0,3.0,3.0,4.0
3,SaurabhAgrawal,5.0,0.0,5.0,3.0,3.0,5.0,2.0,1.0,3.0,...,2.0,2.0,1.0,1.0,0.0,3.0,3.0,4.0,2.0,0.0
4,KaushalJangid,5.0,1.0,3.0,5.0,0.0,3.0,4.0,2.0,0.0,...,0.0,4.0,4.0,4.0,2.0,2.0,2.0,2.0,5.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,RahulPatel,1.0,5.0,1.0,3.0,4.0,1.0,0.0,2.0,3.0,...,2.0,1.0,5.0,0.0,2.0,4.0,5.0,5.0,3.0,3.0
96,NehaSharma,3.0,5.0,5.0,2.0,3.0,0.0,5.0,0.0,5.0,...,4.0,4.0,1.0,4.0,5.0,2.0,5.0,4.0,0.0,1.0
97,VijayJoshi,2.0,4.0,2.0,5.0,5.0,1.0,4.0,4.0,3.0,...,5.0,4.0,5.0,3.0,0.0,4.0,5.0,5.0,2.0,5.0
98,PallaviKapoor,2.0,3.0,3.0,0.0,5.0,5.0,5.0,4.0,0.0,...,3.0,2.0,5.0,5.0,3.0,1.0,5.0,1.0,5.0,0.0


In [72]:
# Set USER_ID1 as index (users as rows, items as columns)
CONTENT_BASED_DATA = DATA.set_index('USER_ID1')

In [73]:
# Transpose to get item-user matrix
ITEM_MATRIX = CONTENT_BASED_DATA.T
ITEM_MATRIX

USER_ID1,NageshAgrawal,UjwalAgrawal,AdvayeAgrawal,SaurabhAgrawal,KaushalJangid,RahulSharma,AnjaliVerma,ArjunPatel,SnehaKumar,RaviSingh,...,MeenaPatel,KaranSharma,SwatiJoshi,NitinKapoor,AnitaSingh,RahulPatel,NehaSharma,VijayJoshi,PallaviKapoor,KunalRaj
Drishyam,2.0,4.0,4.0,5.0,5.0,1.0,5.0,5.0,5.0,0.0,...,5.0,0.0,3.0,5.0,3.0,1.0,3.0,2.0,2.0,3.0
Queen,4.0,2.0,5.0,0.0,1.0,0.0,0.0,3.0,1.0,3.0,...,0.0,2.0,5.0,2.0,5.0,5.0,5.0,4.0,3.0,2.0
Andhadhun,5.0,0.0,3.0,5.0,3.0,5.0,5.0,4.0,3.0,4.0,...,0.0,4.0,0.0,0.0,2.0,1.0,5.0,2.0,3.0,5.0
Article 15,5.0,0.0,4.0,3.0,5.0,0.0,5.0,4.0,5.0,0.0,...,0.0,2.0,0.0,4.0,1.0,3.0,2.0,5.0,0.0,5.0
Zindagi Na Milegi Dobara,5.0,0.0,5.0,3.0,0.0,0.0,4.0,2.0,2.0,1.0,...,4.0,3.0,4.0,3.0,1.0,4.0,3.0,5.0,5.0,4.0
KGF,4.0,3.0,4.0,5.0,3.0,2.0,4.0,5.0,5.0,0.0,...,0.0,5.0,3.0,4.0,0.0,1.0,0.0,1.0,5.0,0.0
Raazi,4.0,5.0,2.0,2.0,4.0,3.0,1.0,5.0,5.0,4.0,...,5.0,5.0,5.0,4.0,4.0,0.0,5.0,4.0,5.0,1.0
Barfi!,5.0,5.0,1.0,1.0,2.0,5.0,3.0,4.0,4.0,1.0,...,2.0,1.0,2.0,5.0,3.0,2.0,0.0,4.0,4.0,4.0
Tanhaji,2.0,3.0,5.0,3.0,0.0,4.0,3.0,5.0,1.0,3.0,...,3.0,3.0,4.0,2.0,5.0,3.0,5.0,3.0,0.0,5.0
Udaan,3.0,5.0,1.0,2.0,2.0,2.0,2.0,5.0,2.0,5.0,...,5.0,1.0,3.0,3.0,5.0,3.0,3.0,0.0,3.0,5.0


In [74]:
# Compute item-item similarity using cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
ITEM_SIMILARITY = pd.DataFrame(cosine_similarity(ITEM_MATRIX),index=ITEM_MATRIX.index,columns=ITEM_MATRIX.index)

In [75]:
# Pick a target user (e.g., 'RahulSharma')
TARGET_USER = 'RahulSharma'
USER_RATING = CONTENT_BASED_DATA.loc[TARGET_USER]
USER_RATING

Drishyam                    1.0
Queen                       0.0
Andhadhun                   5.0
Article 15                  0.0
Zindagi Na Milegi Dobara    0.0
KGF                         2.0
Raazi                       3.0
Barfi!                      5.0
Tanhaji                     4.0
Udaan                       2.0
RRR                         5.0
Sita Ramam                  2.0
Kantara                     2.0
Uri: The Surgical Strike    4.0
A Wednesday                 4.0
Kahaani                     4.0
The Lunchbox                1.0
Piku                        5.0
Pink                        2.0
Sardar Udham                1.0
Name: RahulSharma, dtype: float64

In [76]:
# Calculate weighted sum of item similarities
WEIGHTED_SCORE = ITEM_SIMILARITY.dot(USER_RATING)
WEIGHTED_SCORE

Drishyam                    39.681902
Queen                       38.734538
Andhadhun                   40.152736
Article 15                  39.170158
Zindagi Na Milegi Dobara    38.676171
KGF                         38.833462
Raazi                       39.078236
Barfi!                      41.037534
Tanhaji                     41.024883
Udaan                       40.559719
RRR                         41.548681
Sita Ramam                  39.703648
Kantara                     40.566926
Uri: The Surgical Strike    40.649022
A Wednesday                 39.043794
Kahaani                     40.293884
The Lunchbox                39.554773
Piku                        42.149980
Pink                        39.025384
Sardar Udham                40.401009
dtype: float64

In [77]:
# Normalize by the sum of similarities for items the user has rated
SIMILARITY_SUM = ITEM_SIMILARITY[USER_RATING.index].sum(axis=1)
RECOMMENDATION_SCORE = WEIGHTED_SCORE / SIMILARITY_SUM
RECOMMENDATION_SCORE

Drishyam                    2.601796
Queen                       2.567651
Andhadhun                   2.642401
Article 15                  2.556524
Zindagi Na Milegi Dobara    2.571086
KGF                         2.596375
Raazi                       2.613319
Barfi!                      2.656013
Tanhaji                     2.637433
Udaan                       2.621782
RRR                         2.641387
Sita Ramam                  2.611894
Kantara                     2.593987
Uri: The Surgical Strike    2.629978
A Wednesday                 2.631197
Kahaani                     2.641380
The Lunchbox                2.581543
Piku                        2.639443
Pink                        2.595191
Sardar Udham                2.573761
dtype: float64

In [82]:
#  Remove items already rated by the user
'''RECOMMENDATION_SCORE = RECOMMENDATION_SCORE[~USER_RATING.notna()]
RECOMMENDATION_SCORE'''
RECOMMENDATION_SCORE = WEIGHTED_SCORE / (SIMILARITY_SUM + 1e-9)
RECOMMENDATION_SCORE


Drishyam                    2.601796
Queen                       2.567651
Andhadhun                   2.642401
Article 15                  2.556524
Zindagi Na Milegi Dobara    2.571086
KGF                         2.596375
Raazi                       2.613319
Barfi!                      2.656013
Tanhaji                     2.637433
Udaan                       2.621782
RRR                         2.641387
Sita Ramam                  2.611894
Kantara                     2.593987
Uri: The Surgical Strike    2.629978
A Wednesday                 2.631197
Kahaani                     2.641380
The Lunchbox                2.581543
Piku                        2.639443
Pink                        2.595191
Sardar Udham                2.573761
dtype: float64



### 🔍 `RECOMMENDATION_SCORE` empty kyu hai?

1. **User ne sabhi items rate kar diye** — recommend karne ke liye kuch bacha hi nahi.
2. **Similarity zero ya NaN hai** — score calculate hi nahi ho pa raha.
3. **User ne bahut kam items rate kiye** — baaki ke liye score banana mushkil ho raha.

✅ **Check karne ke liye:**

```python
print(USER_RATING.dropna())           # User ne kya-kya rate kiya
print(SIMILARITY_SUM.head())          # Similarity sum zero ya NaN toh nahi
print(RECOMMENDATION_SCORE.head())    # Score bana bhi ya nahi
```

⚒️ Fix: Smoothing use kar sakta hai:

```python
RECOMMENDATION_SCORE = WEIGHTED_SCORE / (SIMILARITY_SUM + 1e-9)
```



In [83]:
# Show top 5 recommended items
TOP_RECOMMENDATION = RECOMMENDATION_SCORE.sort_values(ascending=False).head(5)
print("Top recommendations for", TARGET_USER)
print(TOP_RECOMMENDATION)

Top recommendations for RahulSharma
Barfi!       2.656013
Andhadhun    2.642401
RRR          2.641387
Kahaani      2.641380
Piku         2.639443
dtype: float64



---

### 📢 Output ka matlab:

Yeh movies (**Barfi!, Andhadhun, RRR, Kahaani, Piku**) RahulSharma ko recommend ho rahi hain **kyunki**:

✅ Wo **unhi movies ke jaise hain** jo Rahul ne already achhi rating di hai.
✅ Har score (jaise 2.65) batata hai ki **kitni strong similarity** hai uske pasand ki movies ke saath.
✅ Jitna zyada score, utni zyada chance ki usse wo movie pasand aayegi.
✅ Pehle se jo movies Rahul ne dekhi hain, wo hataa di gayi hain.



### 📌 Summary:

* **Score** → Similarity + user ke rating ka effect.
* **Top movies** → Wo hain jo Rahul ne abhi tak nahi dekhi par uske taste ke matching hain.
* **Bas** → Recommend kar do inhe Rahul ko! 🎯

Ho gaya bhai simple!


---

**sahi pakda** — **goal tha Rahul ko *new* movies recommend karna** jo usne abhi tak **dekhi nahi**.


### 📌 Lekin ab dikkat ye ho sakti hai:

Jo output mein aayi movies hain:

```python
Barfi!, Andhadhun, RRR, Kahaani, Piku
```

➡️ Agar ye movies Rahul ne **already rate ki thi**, toh wo recommend nahi honi chahiye thi.


### 🔍 To Check:

Tu ye line use kar raha tha:

```python
RECOMMENDATION_SCORE = RECOMMENDATION_SCORE[~USER_RATING.notna()]
```

Ye line **sahi hai** — wo movies hataane ke liye jo Rahul ne already rate ki hain.

🔎 **Check kar le**:

```python
print(USER_RATING.loc[["Barfi!", "Andhadhun", "RRR", "Kahaani", "Piku"]])
```

Agar yahan pe **NaN** hai, toh matlab Rahul ne ye **rate nahi ki thi** → recommend karna sahi hai.

Agar rating dikh rahi hai (jaise 4 ya 5), toh kuch gadbad hai — filtering ka step dubara dekhna padega.



### 📣 Summary (1 Line):

> Rahul ko **sirf wahi movies recommend honi chahiye** jo usne **abhi tak rate nahi ki** — isiliye `USER_RATING.isna()` ka use karo.

---


Bhai, **perfect question poocha** tune! Agar:

> 🔴 *Rahul ne saari movies already rate kar di hain,*
> toh fir **kya recommend karein?**

---

### ✅ 1. **Kuch recommend nahi hoga** — aur ye **sahi behaviour** hai.

* Tere model sirf **unrated movies** hi recommend karta hai.
* Agar **sabhi movies pe rating de di**, toh **nayi recommendation possible nahi**.



### 🤔 Toh ab kya kar sakte ho?

#### 🛠 Option 1: **New movies add kar data mein**

* Dataset mein naye movies add karo.
* Tab model unpe similarity calculate karega aur recommend karega.



#### 🛠 Option 2: **Top popular movies dikha do**

Agar user ne sab kuch rate kar diya ho, toh:

```python
# Fallback logic: top rated/popular movies from all users
POPULAR_ITEMS = DATA.mean().sort_values(ascending=False).head(5)
print("Popular movies you might like:\n", POPULAR_ITEMS)
```



#### 🛠 Option 3: **Content-Based Filtering use karo**

Agar user ne sabhi movies rate ki hain, toh:

* Item ki genre, actor, director dekh kar similar content recommend karo.
* Item-based model ke alawa ye **content-based** backup plan ho sakta hai.



### 📌 Final Baat:

> Agar Rahul ne **har movie ko rate kar diya hai**, toh recommendation model ke paas **kuch bacha hi nahi recommend karne ko** — isiliye ya toh **naye items add karo**, ya **popular ya content-based fallback** use karo.





### 🔹 Short mein:

> **Haan, data mein null values (NaN) honi chahiye** —
> taaki model samjhe ki **user ne wo movie nahi dekhi**.



### 🎯 Matlab:

* **Rahul ke liye NaN** ⇒ usne movie nahi dekhi → recommend ho sakti hai
* **Dusre users ke liye rating** ⇒ similarity calculate karne ke liye kaafi



### ✔️ Example:

| Movie    | Rahul | Priya | Aman   |
| -------- | ----- | ----- | ------ |
| Queen    | 5     | 4     | 5      |
| Drishyam | NaN   | 5     | 4    ✅ |

Yahan Rahul ne **Drishyam** nahi dekhi → recommend ho sakti hai.



Bas bhai! Data mein NaN hona **zaroori hai** taaki recommendation sahi chale.
