In [51]:
import pandas as pd

In [52]:
# Load dataset
df = pd.read_csv('/content/anime.csv')  # Replace with actual file path


In [53]:
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [54]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [55]:
df.describe()

Unnamed: 0,anime_id,rating,members
count,12294.0,12064.0,12294.0
mean,14058.221653,6.473902,18071.34
std,11455.294701,1.026746,54820.68
min,1.0,1.67,5.0
25%,3484.25,5.88,225.0
50%,10260.5,6.57,1550.0
75%,24794.5,7.18,9437.0
max,34527.0,10.0,1013917.0


In [56]:
df.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,62
type,25
episodes,0
rating,230
members,0


In [57]:
# Option to fill missing values (depends on the dataset)
df['rating'] = df['rating'].fillna(df['rating'].mean())  # Replace NaNs with mean rating

In [58]:
# Split genres by comma, apply one-hot encoding
df['genres'] = df['genre'].str.split(', ')
df_genres = df['genres'].str.join('|').str.get_dummies()


In [59]:
# Concatenate the one-hot encoded genres with the original DataFrame
df_combined = pd.concat([df, df_genres], axis=1)

In [60]:
from sklearn.preprocessing import MinMaxScaler

In [61]:
scaler = MinMaxScaler()
df_combined['rating'] = scaler.fit_transform(df_combined[['rating']])

In [62]:
from sklearn.metrics.pairwise import cosine_similarity

In [15]:
# Define the features to compute similarity on
features = df_genres.columns.tolist() + ['rating']

In [63]:
# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(df_combined[features])

In [64]:
# Define a function to recommend anime
def recommend_anime(anime_title, similarity_matrix, df, top_n=5):
    # Ensure the anime title exists
    if anime_title in df['title'].values:
        # Get the index of the anime
        idx = df[df['title'] == anime_title].index[0]

        # Get similarity scores for the selected anime
        sim_scores = list(enumerate(similarity_matrix[idx]))

        # Sort anime by similarity scores (highest to lowest)
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

        # Get indices of the top_n most similar anime (excluding the target anime itself)
        sim_scores = sim_scores[1:top_n+1]

        # Return titles of the similar anime
        anime_indices = [i[0] for i in sim_scores]
        return df['title'].iloc[anime_indices]
    else:
        raise ValueError(f"Anime title '{anime_title}' not found in the dataset.")

In [65]:
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import precision_score, recall_score, f1_score

In [66]:
train_data, test_data = train_test_split(df_combined, test_size=0.2, random_state=42)

In [67]:
# Example dummy labels for demonstration
true_labels = [1, 0, 1, 0, 1]  # Binary relevance labels for test anime
predicted_labels = [1, 0, 1, 1, 1]  # Predicted relevance labels


In [68]:
# For example: true_positives = number of relevant anime in top_n recommendations
precision = precision_score(true_labels, predicted_labels, average='macro')
recall = recall_score(true_labels, predicted_labels, average='macro')
f1 = f1_score(true_labels, predicted_labels, average='macro')

In [69]:
print(f'Precision: {precision}, Recall: {recall}, F1-Score: {f1}')

Precision: 0.875, Recall: 0.75, F1-Score: 0.7619047619047619


# Interview Questions:

1. Can you explain the difference between user-based and item-based collaborative filtering?
User-based collaborative filtering and item-based collaborative filtering are two types of collaborative filtering techniques used in recommendation systems. Both approaches rely on the concept of similarity but differ in terms of what they focus on for finding similarities.

User-Based Collaborative Filtering:
Focus: It looks for similarities between users.
Approach: The idea is to recommend items to a user based on the preferences of users who are similar to them.
For example, if User A and User B have similar tastes, and User B likes a particular anime that User A hasn’t seen yet, that anime could be recommended to User A.
How it works:
First, the algorithm calculates the similarity between users (based on their ratings, preferences, or interactions).
Then, it finds the closest or most similar users to the target user.
Finally, it recommends items that these similar users liked but the target user hasn’t interacted with yet.
Pros:
Works well when users have enough historical interaction data.
Can be useful when users’ tastes are highly individualistic.
Cons:
Suffers from the cold-start problem when there are new users with little or no data.
If two users have little overlap in the items they've rated, the recommendation accuracy can drop.
Item-Based Collaborative Filtering:
Focus: It looks for similarities between items.
Approach: The idea is to recommend items similar to the ones a user has already liked or interacted with.
For example, if a user likes a certain anime, the system will recommend other anime that are similar to the one they liked.
How it works:
First, the algorithm calculates the similarity between items (based on user interactions or ratings).
Then, for a given user, it looks at the items the user has interacted with and finds similar items using the similarity matrix.
Finally, it recommends those similar items to the user.
Pros:
More stable over time, as item similarities tend to change less frequently than user preferences.
It can handle scenarios where a user has only interacted with a few items (better than user-based in sparse datasets).
Cons:
Suffers from the cold-start problem when new items are introduced, as there is little data to calculate similarity.
If items have very different characteristics, finding similar items can be difficult.




2. What is collaborative filtering, and how does it work?
Collaborative filtering is a technique used in recommendation systems to suggest items to users based on the interactions of other users in the system. The key idea is that users who have agreed in the past (e.g., rated similar items) will continue to agree in the future.

How It Works:
Collaborative filtering relies on two main types of input:

User-Item Interactions: This can be explicit feedback (e.g., ratings) or implicit feedback (e.g., clicks, watch time, purchase history).
Similarity: It tries to find patterns of similarity, either between users or between items, to make recommendations.
There are two main types of collaborative filtering:

User-Based Collaborative Filtering:

This method looks for similarities between users based on their behavior (e.g., ratings or interactions with items).
Once similar users are identified, recommendations are made by suggesting items that the similar users liked, but the target user hasn’t interacted with yet.
Example: In a movie recommendation system, if two users have given similar ratings to the same set of movies, they are considered similar, and a movie liked by one user is recommended to the other.
Item-Based Collaborative Filtering:

This method looks for similarities between items based on users’ interactions with them.
For a given user, the system identifies items they have liked and finds other items similar to them, then recommends those items to the user.
Example: In an e-commerce setting, if you bought a certain product, the system will recommend similar products that other users with similar purchase histories have bought.
Types of Collaborative Filtering Algorithms:
Memory-Based Collaborative Filtering:

This approach uses the entire user-item interaction matrix to compute similarities either between users or items.
Similarity measures such as cosine similarity, Pearson correlation, or Euclidean distance are commonly used.
Example: The system will calculate the similarity between users or items using past interactions directly and recommend based on those similarities.
Model-Based Collaborative Filtering:

Instead of relying directly on the user-item interaction matrix, this approach uses machine learning models (like matrix factorization, SVD, or neural networks) to learn patterns and make recommendations.
These models try to predict a user’s rating or interaction with an item by learning latent factors or embeddings.
Example: Matrix factorization (e.g., SVD) can reduce the interaction matrix into lower-dimensional representations of users and items, which can be used to predict missing interactions (i.e., which items a user might like).
Advantages:
Doesn’t require explicit item attributes (such as genres, price, etc.)—just user-item interactions.
Works well when there is a lot of data.
Challenges:
Cold-Start Problem: When a new user or item enters the system, there isn’t enough historical data to make accurate recommendations.
Sparsity: Many real-world datasets are sparse, meaning most users have interacted with only a small portion of items, making it hard to find similarities.
Scalability: As the number of users and items grows, computing similarities and making recommendations can become computationally expensive.
How it Works in a Step-by-Step Fashion:
Data Collection: Collect user-item interaction data, either through explicit feedback (e.g., ratings) or implicit feedback (e.g., clicks, purchase history).
Similarity Calculation:
In user-based filtering, calculate similarity between users.
In item-based filtering, calculate similarity between items.
Prediction/Recommendation:
In user-based filtering, recommend items based on what similar users have liked.
In item-based filtering, recommend items similar to the ones a user has liked.
Evaluation: Use metrics such as precision, recall, or RMSE (Root Mean Square Error) to evaluate the quality of the recommendations.
Collaborative filtering is widely used in applications like movie recommendations (Netflix), product suggestions (Amazon), and social media content suggestions.






