# Anime Recommendation System Using Cosine Similarity

In [1]:
# -----------------------------------------------
# STEP 1: IMPORT NECESSARY LIBRARIES
# -----------------------------------------------

import pandas as pd
import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer, MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

In [5]:
# -----------------------------------------------
# STEP 2: LOAD THE DATASET
# -----------------------------------------------

# Load the Anime dataset (update the path if needed)
df = pd.read_csv('anime.csv')

# Show the first few rows of the dataset
print("\nFirst 5 rows of the dataset:")
print(df.head())


First 5 rows of the dataset:
   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  


In [6]:
# -----------------------------------------------
# STEP 3: DATA EXPLORATION AND CLEANING
# -----------------------------------------------

In [7]:
# Check for missing values and data types
print("\nDataset info:")
print(df.info())


Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB
None


In [8]:
# Check the number of missing values in each column
print("\nMissing values in each column:")
print(df.isnull().sum())


Missing values in each column:
anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64


In [9]:
# Handle missing genres by replacing with 'Unknown'
df['genre'] = df['genre'].fillna('Unknown')

In [13]:
# Handle missing broadcast types ('type' column) by replacing with 'Unknown'
df['type'] = df['type'].fillna('Unknown')

In [10]:
# Handle missing ratings by replacing them with the mean rating
df['rating'] = df['rating'].fillna(df['rating'].mean())

In [11]:
# Convert 'episodes' column from 'Unknown' to 0 and change datatype to int
df['episodes'] = df['episodes'].replace('Unknown', '0').astype(int)

In [15]:
#checking all null values are treated
print(df.isnull().sum())

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64


In [16]:
# -----------------------------------------------
# STEP 4: FEATURE EXTRACTION AND TRANSFORMATION
# -----------------------------------------------

In [17]:
# Split genre strings into lists so that we can apply MultiLabelBinarizer
df['genre'] = df['genre'].apply(lambda x: x.split(', ') if isinstance(x, str) else [])

In [18]:
# Convert genre lists to binary features using MultiLabelBinarizer
mlb = MultiLabelBinarizer()
genre_features = pd.DataFrame(mlb.fit_transform(df['genre']), columns=mlb.classes_)

In [19]:
# Normalize the 'rating' and 'members' columns to bring them to the same scale
scaler = MinMaxScaler()
df[['rating', 'members']] = scaler.fit_transform(df[['rating', 'members']])

In [20]:
# Combine the genre binary features with normalized rating and members columns
feature_df = pd.concat([genre_features, df[['rating', 'members']].reset_index(drop=True)], axis=1)

In [21]:
# -----------------------------------------------
# STEP 5: BUILD THE RECOMMENDATION SYSTEM
# -----------------------------------------------

In [22]:
# Compute the cosine similarity matrix for all anime
cosine_sim = cosine_similarity(feature_df)

In [23]:
# Map anime titles to their index in the dataframe
anime_indices = pd.Series(df.index, index=df['name']).drop_duplicates()

In [24]:
# Function to recommend anime based on cosine similarity
def recommend_anime(title, top_n=5):
    """
    Recommend anime similar to the given title based on cosine similarity.
    Args:
        title (str): The anime title for which recommendations are needed.
        top_n (int): The number of similar anime to return.
    Returns:
        pd.Series: Titles of recommended anime.
    """
    if title not in anime_indices:
        return f"Anime '{title}' not found in the dataset."

    idx = anime_indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort by similarity score in descending order
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Exclude the first anime (itself), and take the top N
    sim_scores = sim_scores[1:top_n+1]

    # Get the indices of the recommended anime
    anime_indices_top = [i[0] for i in sim_scores]

    return df['name'].iloc[anime_indices_top]

In [25]:
# Example usage
print("\nExample Recommendations for 'Naruto':")
print(recommend_anime('Naruto', top_n=5))


Example Recommendations for 'Naruto':
615                                    Naruto: Shippuuden
1472          Naruto: Shippuuden Movie 4 - The Lost Tower
1573    Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...
486                              Boruto: Naruto the Movie
1343                                          Naruto x UT
Name: name, dtype: object


In [26]:
# -----------------------------------------------
# STEP 6: EVALUATION (BASIC PRECISION@K)
# -----------------------------------------------

In [27]:
# Simplified evaluation based on genre overlap
def evaluate_recommendation(title, top_n=5):
    """
    Evaluate recommendations based on whether recommended anime share genres with the original anime.
    Args:
        title (str): The anime title for evaluation.
        top_n (int): Number of recommendations to evaluate.
    Returns:
        float: Precision at top_n recommendations.
    """
    if title not in anime_indices:
        return 0

    recommended = recommend_anime(title, top_n)

    # Get genres of the target anime
    original_genres = set(df.loc[anime_indices[title], 'genre'])

    matches = 0
    for anime in recommended:
        rec_genres = set(df.loc[anime_indices[anime], 'genre'])
        # Count matches if there is at least one common genre
        if original_genres & rec_genres:
            matches += 1

    precision = matches / top_n
    return precision

In [28]:
# Example evaluation
print("\nPrecision@5 for 'Naruto':")
print(evaluate_recommendation('Naruto', top_n=5))


Precision@5 for 'Naruto':
1.0


**Interview Questions:**

# 1. Can you explain the difference between user-based and item-based collaborative filtering?

Collaborative filtering is a technique used in recommendation systems. It works by finding patterns of preferences among users or items. There are two major types:

## ✅ 1. User-Based Collaborative Filtering
- **Concept:** Recommend items to a user based on the preferences of **similar users**.
- **Working:**
  - Find users who have a **similar taste or behavior**.
  - Recommend items liked by these similar users but **not yet interacted** with by the target user.
- **Example:**
  - If User A and User B both like Anime X and Anime Y, and User B also likes Anime Z, then Anime Z may be recommended to User A.
- **Pros:**
  - Easy to implement.
  - Works well in communities where user preferences are stable.
- **Cons:**
  - Struggles with **scalability** when there are many users.
  - Suffers when user data is **sparse** (cold-start problem).

---

## ✅ 2. Item-Based Collaborative Filtering
- **Concept:** Recommend items similar to those the user has already liked or interacted with.
- **Working:**
  - Find **items that are similar** to the ones the user has already rated or interacted with.
  - Recommend these similar items to the user.
- **Example:**
  - If a user liked Anime A and Anime A is similar to Anime B (because many users liked both), recommend Anime B.
- **Pros:**
  - More **scalable** as the number of items tends to be fewer than the number of users.
  - Recommendations are **more stable over time**, since item similarity changes less frequently.
- **Cons:**
  - Needs a sufficient amount of item interaction data to build accurate similarity scores.

---

## 📊 Summary Table

| Aspect                            | User-Based Filtering                          | Item-Based Filtering                        |
|------------------------------------|-----------------------------------------------|--------------------------------------------|
| Based on                           | Similar users' preferences                    | Similarity between items                   |
| Example                            | Recommend what similar users liked            | Recommend items similar to what you liked  |
| Scalability                        | Less scalable with many users                 | More scalable                              |
| Cold-start Problem                 | Affects new users                             | Affects new items                          |
| Data Dependency                    | Depends on user similarity                    | Depends on item similarity                 |

---

# 2. What is Collaborative Filtering and How Does it Work?

## ✅ Collaborative Filtering: Overview

Collaborative Filtering is one of the most widely used techniques in recommendation systems.  
It makes automatic predictions (filtering) about the interests of a user by collecting preferences from **many users (collaborative).**

---

## 🔍 How Collaborative Filtering Works:

Collaborative Filtering is based on the assumption:
> "**If users agreed in the past, they will likely agree again in the future.**"

It works in the following way:

### **Step 1: Collect User-Item Interactions**
- Example: A user rating an anime, purchasing a product, watching a movie, etc.
- This data is stored in a **user-item interaction matrix**, where rows = users, columns = items.

### **Step 2: Calculate Similarities**
- Similarities can be calculated in two ways:
  - **User Similarity:** How similar are two users based on their preferences?
  - **Item Similarity:** How similar are two items based on user preferences?

### **Step 3: Predict User Preferences**
- Based on similarity:
   - Recommend items that similar users have liked (**User-Based Collaborative Filtering**).
   - Recommend items similar to the ones the user liked (**Item-Based Collaborative Filtering**).

### **Step 4: Generate Recommendations**
- Recommend top **N items** with the highest predicted ratings or interaction likelihood.

---

## 🔑 Types of Collaborative Filtering

| Type                              | Description                                                             |
|------------------------------------|-------------------------------------------------------------------------|
| **User-Based Collaborative Filtering** | Recommends items liked by similar users.                               |
| **Item-Based Collaborative Filtering** | Recommends items similar to those the user has interacted with.         |

---

## ⚠️ Challenges in Collaborative Filtering
- **Cold Start Problem:** Difficult to recommend for new users or items without enough data.
- **Data Sparsity:** Users interact with a small subset of items → large sparse matrices.
- **Scalability:** Computing similarity for large datasets can be computationally expensive.

---

## ✅ Example Use Cases:
- Netflix recommending shows based on other users’ watch history.
- Amazon suggesting products based on user purchase behavior.
- Spotify recommending songs based on user listening habits.



# 🙏 Thank You! 😊