<a href="https://colab.research.google.com/github/cate0123/recommender_system/blob/main/Recommender_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install pandas scikit-learn




# **COLLABORATIVE FILTERING**

1.Load dataset

In [None]:
import pandas as pd

from google.colab import files
uploaded = files.upload()

df = pd.read_csv("movies.csv")
print(df.head())


Saving movies.csv to movies.csv
   user_id        movie_title  rating 
0        1        Toy  Story         5
1        1            Jumanji        3
2        2        Toy  Story         4
3        2  Grumpier Old Men         5
4        3            Jumanji        4


In [None]:
df.columns = df.columns.str.strip()


user_item_matrix = df.pivot_table(index="user_id", columns="movie_title", values="rating")
print(user_item_matrix)

movie_title  Grumpier Old Men   Jumanji  Toy  Story 
user_id                                             
1                          NaN      3.0          5.0
2                          5.0      NaN          4.0
3                          NaN      4.0          NaN


Apply Collaborative Filtering (Cosine Similarity)

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Fill NaN with 0 for similarity calculation
matrix_filled = user_item_matrix.fillna(0)

# Compute similarity between users
similarity = cosine_similarity(matrix_filled)
print("User Similarity Matrix:\n", similarity)

User Similarity Matrix:
 [[1.         0.53567158 0.51449576]
 [0.53567158 1.         0.        ]
 [0.51449576 0.         1.        ]]


 Make Recommendations

 Make Recommendations

In [None]:
# Example: Recommend for user 1 based on most similar user
import numpy as np

user_index = 0  # user_id = 1
similar_users = similarity[user_index]

# Find the most similar user (excluding self)
most_similar_user = np.argsort(similar_users)[-2]

# Get movies rated by most similar user
recommended_movies = user_item_matrix.iloc[most_similar_user].dropna().index.tolist()
print(f"Recommended movies for User 1: {recommended_movies}")

Recommended movies for User 1: ['Grumpier Old Men ', 'Toy  Story ']


# **CONTENT BASED FILTERING**

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

1. Load your dataset

In [None]:
from google.colab import files
uploaded = files.upload()

df = pd.read_csv("disney_movies.csv")
print(df.head())

Saving disney_movies.csv to disney_movies (1).csv
                       movie_title release_date      genre mpaa_rating  \
0  Snow White and the Seven Dwarfs   1937-12-21    Musical           G   
1                        Pinocchio   1940-02-09  Adventure           G   
2                         Fantasia   1940-11-13    Musical           G   
3                Song of the South   1946-11-12  Adventure           G   
4                       Cinderella   1950-02-15      Drama           G   

   total_gross  inflation_adjusted_gross  
0    184925485                5228953251  
1     84300000                2188229052  
2     83320000                2187090808  
3     65000000                1078510579  
4     85000000                 920608730  


2. Create a 'features' column

In [None]:
if "genres" in df.columns and "description" in df.columns:
    df["features"] = df["genres"].fillna('') + " " + df["description"].fillna('')
elif "genres" in df.columns:
    df["features"] = df["genres"].fillna('')
else:
    df["features"] = df["movie_title"]


 4. Vectorize features

In [None]:
tfidf = TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(df["features"])

5. Compute similarity (cosine)

In [None]:
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
indices = pd.Series(df.index, index=df["movie_title"]).drop_duplicates()


Movies recommender function

In [None]:
def recommend_movies(title, n=5):
    if title not in indices:
        return f"❌ Movie '{title}' not found in dataset."

    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:n+1]  # skip the movie itself
    movie_indices = [i[0] for i in sim_scores]

    return df["movie_title"].iloc[movie_indices].tolist()

print("Recommendations for 'Frozen':")
print(recommend_movies("Frozen", 5))

Recommendations for 'Frozen':
['Snow White and the Seven Dwarfs', 'Pinocchio', 'Fantasia', 'Song of the South', 'Cinderella']


**Content Based filtering  vs Collaborative fltering**

**Content Based Filtering**
1. BF recommends items based on their attributes or features (e.g., genre, director, keywords).
2. CBF doesn't rely on user interaction data, making it suitable for new users or items.
3. 3. CBF recommends items similar to the ones a user has liked or interacted with.

**Collaborative filtering**

1. CF recommends items based on the behavior of similar users (e.g., ratings, clicks, purchases).
2. CF relies on user interaction data to identify patterns and make recommendations.
3.  CF recommends items liked or interacted with by similar users.

Collaborative filtering worked better because it was able to capture patterns in user behavior and generate more personalized recommendations. While content-based filtering relied too heavily on available metadata (genres and descriptions), which limited its diversity and accuracy, collaborative filtering leveraged user–item interactions and produced suggestions that aligned more closely with actual user preferences. However, its effectiveness was still challenged by sparse data and the cold start problem.

Challenges


**Collaborative Filtering**

1. Faced sparse data in the user–item matrix, with many missing ratings (NaN values) that had to be replaced with 0, which reduced accuracy.
2. Struggled with the cold start problem, since new users or movies without prior ratings could not be recommended effectively.

**Content-Based Filtering**

1. Depended heavily on item metadata; missing or incomplete features (like genres or description) limited recommendation quality.
2. Generated less diverse recommendations because it focused mainly on item similarity, often suggesting very similar movies.