# Recommendation System

A Recommendation System is a machine learning system that suggests items to users based on their preferences, behavior, or similarities with other users.

- Reduce information overload
- Improve user experience
- Increase engagement and sales
- Provide personalized content

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Load The DataSet

data = pd.read_csv('anime.csv')
data.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [3]:
data.shape

(12294, 7)

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [5]:
# cheking null values
data.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [6]:
# Handling null values

data['genre'] = data['genre'].fillna('')
data['type'] = data['type'].fillna(data['type'].mode()[0])
data['rating'] = data['rating'].fillna(data['rating'].mean())

# Converting Episodes to numeric
data['episodes'] = data['episodes'].replace('Unknown', 0).astype(int)
data['episodes'] = data['episodes'].fillna(data['episodes'].median())

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12294 non-null  object 
 3   type      12294 non-null  object 
 4   episodes  12294 non-null  int64  
 5   rating    12294 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(3), object(3)
memory usage: 672.5+ KB


In [8]:
data.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


#### deviding features as per Variable type for normalizing & encoding

In [9]:
# Normalizing for num_data 

from sklearn.preprocessing import MinMaxScaler

num_features = data[['rating', 'members', 'episodes']]
scaler = MinMaxScaler()
num_features_scaled = scaler.fit_transform(num_features)

#### Feature Extraction
- Converting categorical with using TF-IDF becz multi label Text data
- data['genre'] = here text as multi labeld

In [10]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')
genre_features = tfidf.fit_transform(data['genre'])

In [11]:
from scipy.sparse import hstack

final_features = hstack([genre_features, num_features_scaled]).tocsr()


In [12]:
final_features.shape

(12294, 49)

#### Building Recommendation System with using cosine_similarity

In [13]:
from sklearn.metrics.pairwise import cosine_similarity

sim_cos = cosine_similarity(final_features)

In [14]:
def recommend_anime(anime_index, n=5):
    scores = sim_cos[anime_index]
    top_indices = scores.argsort()[::-1][1:n+1]

    # Prepare a DataFrame with anime names and similarity scores
    recommendations = data.iloc[top_indices].copy()
    recommendations['similarity'] = scores[top_indices]

    print(f"Recommendations for: {data.iloc[anime_index]['name']}")
    return recommendations


In [15]:
recommend_anime(500, n=5)

Recommendations for: Sayonara Zetsubou Sensei


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,similarity
502,3228,Zoku Sayonara Zetsubou Sensei,"Comedy, Parody, School",TV,13,8.03,74040,0.99696
522,6377,Zan Sayonara Zetsubou Sensei,"Comedy, Parody, School",TV,13,8.01,55402,0.995722
575,4872,Goku Sayonara Zetsubou Sensei,"Comedy, Parody, School",OVA,3,7.96,40358,0.994553
1363,490,Paniponi Dash!,"Comedy, Parody, School",TV,26,7.57,38532,0.99384
618,7044,Zan Sayonara Zetsubou Sensei Bangaichi,"Comedy, Parody, School",OVA,2,7.94,22704,0.993004


In [16]:
# Validation

scores = cosine_similarity( final_features)
print("Min:", scores.min())
print("Max:", scores.max())
print("Mean:", scores.mean())

Min: 4.6554254865115283e-10
Max: 1.0000000000000007
Mean: 0.3282680068136551


## Applying threshold for Recommend System

In [17]:
# Maping the Features
anime_to_index = pd.Series(data.index, index=data['name'])
index_to_anime = data['name']


#### Building Recommendation System with using cosine_similarity

In [18]:
def recommend_anime(anime_name, top_n=5, threshold=0.5):

    # Step 1: Get index of the anime
    anime_index = anime_to_index[anime_name]

    # Step 2: Compute cosine similarity
    similarity_scores = cosine_similarity(final_features[anime_index],final_features).flatten()

    # Step 3: Create result table
    result = pd.DataFrame({
        'Anime_Index': data.index,
        'Anime_Name': data['name'],
        'Anime_Rating': data['rating'],
        'Similarity_Score': similarity_scores
    })

    # Step 4: Remove same anime & apply threshold
    result = result[(result['Similarity_Score'] >= threshold) &(result['Anime_Name'] != anime_name)]

    # Step 5: Sort and return top N
    return result.sort_values(by='Similarity_Score',ascending=False).head(top_n)


In [19]:
recommend_anime("_Summer Specials", top_n=10, threshold=0.1)

Unnamed: 0,Anime_Index,Anime_Name,Anime_Rating,Similarity_Score
5965,5965,Tokimeki Memorial 4 OVA,6.3,0.997591
6214,6214,Mashiro-iro Symphony: Airi ga Anata no Kanojo ...,6.21,0.862992
11465,11465,Madonna: Kanjuku Body Collection,6.57,0.846875
11619,11619,Guren,6.36,0.846841
11433,11433,Kindan no Byoutou The Animation,6.62,0.846839
11423,11423,Yobai Suru Shichinin no Harame,6.63,0.84683
11377,11377,Harukoi Otome,6.73,0.846702
11343,11343,Reunion,6.79,0.846595
11266,11266,Shoujo x Shoujo x Shoujo The Animation,6.94,0.84623
11240,11240,Shocking Pink!,7.0,0.846046


In [20]:
recommend_anime("_Summer Specials", top_n=10, threshold=0.8)

Unnamed: 0,Anime_Index,Anime_Name,Anime_Rating,Similarity_Score
5965,5965,Tokimeki Memorial 4 OVA,6.3,0.997591
6214,6214,Mashiro-iro Symphony: Airi ga Anata no Kanojo ...,6.21,0.862992
11465,11465,Madonna: Kanjuku Body Collection,6.57,0.846875
11619,11619,Guren,6.36,0.846841
11433,11433,Kindan no Byoutou The Animation,6.62,0.846839
11423,11423,Yobai Suru Shichinin no Harame,6.63,0.84683
11377,11377,Harukoi Otome,6.73,0.846702
11343,11343,Reunion,6.79,0.846595
11266,11266,Shoujo x Shoujo x Shoujo The Animation,6.94,0.84623
11240,11240,Shocking Pink!,7.0,0.846046


In [21]:
recommend_anime("_Summer Specials", top_n=10, threshold=0.9)

Unnamed: 0,Anime_Index,Anime_Name,Anime_Rating,Similarity_Score
5965,5965,Tokimeki Memorial 4 OVA,6.3,0.997591


Expected:
- High threshold → fewer results
- Low threshold → more results

#### Validation : Similarity Score Distribution Check

In [22]:
anime_idx = anime_to_index["Naruto"]

scores = cosine_similarity(
    final_features[anime_idx],
    final_features
).flatten()

print("Min:", scores.min())
print("Max:", scores.max())
print("Mean:", scores.mean())


Min: 6.847989323662131e-05
Max: 1.0
Mean: 0.32325281185068155


#### Interview Questions:
1. Can you explain the difference between user-based and item-based collaborative filtering?
2. What is collaborative filtering, and how does it work?

### 1. Can you explain the difference between user-based and item-based collaborative filtering?

#### User-Based Collaborative Filtering (UBCF)
- Finds users who behave similarly.
- Recommends items that similar users liked
- Less stable (user behavior changes)

Example:
If User A and User B like similar movies, recommend B’s movies to A.

#### Item-Based Collaborative Filtering
- Finds items that are similar based on user behavior.
- Recommends items similar to what the user already likes.
- More stable (item relationships stay same)

Example:
If Movie X and Movie Y are watched by many of the same users, recommend Y if the user liked X.

### 2. What is collaborative filtering, and how does it work?
Collaborative Filtering is a recommendation method that uses the behavior of many users to predict what a target user will like.