#Recommendation System

##1. Data Preprocessing

In [30]:
import pandas as pd

In [31]:
df=pd.read_csv('/content/anime.csv')

In [32]:
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [34]:
df.describe(include='all')

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
count,12294.0,12294,12232,12269,12294.0,12064.0,12294.0
unique,,12292,3264,6,187.0,,
top,,Saru Kani Gassen,Hentai,TV,1.0,,
freq,,2,823,3787,5677.0,,
mean,14058.221653,,,,,6.473902,18071.34
std,11455.294701,,,,,1.026746,54820.68
min,1.0,,,,,1.67,5.0
25%,3484.25,,,,,5.88,225.0
50%,10260.5,,,,,6.57,1550.0
75%,24794.5,,,,,7.18,9437.0


In [35]:
df.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,62
type,25
episodes,0
rating,230
members,0


In [36]:
df['rating'] = df['rating'].fillna(df['rating'].mean())

In [37]:
df = df.dropna(subset=['genre', 'type'])

In [38]:
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [39]:
df.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,0
type,0
episodes,0
rating,0
members,0


##2. Feature Extraction

In [40]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [41]:
tfidf = TfidfVectorizer(stop_words='english')
genre_matrix = tfidf.fit_transform(df['genre'])

In [42]:
from sklearn.preprocessing import MinMaxScaler

In [46]:
df['episodes'] = df['episodes'].replace('Unknown', pd.NA)

In [47]:
df['episodes'] = pd.to_numeric(df['episodes'])


In [48]:
df['episodes'] = df['episodes'].fillna(df['episodes'].median())

In [71]:
scaler = MinMaxScaler()
num_features = scaler.fit_transform(df[['episodes', 'rating', 'members']])


In [50]:
from scipy.sparse import hstack

final_matrix = hstack([genre_matrix, num_features])

##3. Recommendation System

In [51]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(final_matrix)


In [52]:
def recommend_anime(title, top_n=10, threshold=0.3):
    # find index of anime
    if title not in df['name'].values:
        return "Anime not found in the dataset."

    idx = df.index[df['name'] == title][0]

    # similarity scores
    scores = list(enumerate(cosine_sim[idx]))

    # filter using threshold
    filtered = [(i, s) for i, s in scores if s >= threshold and i != idx]

    # sort high to low
    filtered = sorted(filtered, key=lambda x: x[1], reverse=True)

    # top n recommendations
    top_indices = [i for i, s in filtered[:top_n]]

    return df['name'].iloc[top_indices]


In [64]:
recommend_anime("Naruto", top_n=5, threshold=0.2)


Unnamed: 0,name
615,Naruto: Shippuuden
206,Dragon Ball Z
346,Dragon Ball
1472,Naruto: Shippuuden Movie 4 - The Lost Tower
1573,Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...


In [65]:
for t in [0.1, 0.2, 0.3, 0.4]:
    print(f"\nThreshold = {t}")
    print(recommend_anime("Naruto", top_n=5, threshold=t))


Threshold = 0.1
615                                    Naruto: Shippuuden
206                                         Dragon Ball Z
346                                           Dragon Ball
1472          Naruto: Shippuuden Movie 4 - The Lost Tower
1573    Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...
Name: name, dtype: object

Threshold = 0.2
615                                    Naruto: Shippuuden
206                                         Dragon Ball Z
346                                           Dragon Ball
1472          Naruto: Shippuuden Movie 4 - The Lost Tower
1573    Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...
Name: name, dtype: object

Threshold = 0.3
615                                    Naruto: Shippuuden
206                                         Dragon Ball Z
346                                           Dragon Ball
1472          Naruto: Shippuuden Movie 4 - The Lost Tower
1573    Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...
Name: name, dtype: object



In [66]:
for t in [0.1, 0.2, 0.3, 0.4]:
    print(f"\nThreshold = {t}")
    print(recommend_anime("Dragon Ball", top_n=5, threshold=t))


Threshold = 0.1
206                             Dragon Ball Z
588                           Dragon Ball Kai
1930                        Dragon Ball Super
515                    Dragon Ball Kai (2014)
1409    Dragon Ball Z Movie 15: Fukkatsu no F
Name: name, dtype: object

Threshold = 0.2
206                             Dragon Ball Z
588                           Dragon Ball Kai
1930                        Dragon Ball Super
515                    Dragon Ball Kai (2014)
1409    Dragon Ball Z Movie 15: Fukkatsu no F
Name: name, dtype: object

Threshold = 0.3
206                             Dragon Ball Z
588                           Dragon Ball Kai
1930                        Dragon Ball Super
515                    Dragon Ball Kai (2014)
1409    Dragon Ball Z Movie 15: Fukkatsu no F
Name: name, dtype: object

Threshold = 0.4
206                             Dragon Ball Z
588                           Dragon Ball Kai
1930                        Dragon Ball Super
515                    Dra

In [67]:
for t in [0.1, 0.2, 0.3, 0.4]:
    print(f"\nThreshold = {t}")
    print(recommend_anime("One Piece", top_n=5, threshold=t))


Threshold = 0.1
241    One Piece: Episode of Nami - Koukaishi no Nami...
86                                    Shingeki no Kyojin
231    One Piece: Episode of Merry - Mou Hitori no Na...
896    One Piece: Episode of Sabo - 3 Kyoudai no Kizu...
6                                 Hunter x Hunter (2011)
Name: name, dtype: object

Threshold = 0.2
241    One Piece: Episode of Nami - Koukaishi no Nami...
86                                    Shingeki no Kyojin
231    One Piece: Episode of Merry - Mou Hitori no Na...
896    One Piece: Episode of Sabo - 3 Kyoudai no Kizu...
6                                 Hunter x Hunter (2011)
Name: name, dtype: object

Threshold = 0.3
241    One Piece: Episode of Nami - Koukaishi no Nami...
86                                    Shingeki no Kyojin
231    One Piece: Episode of Merry - Mou Hitori no Na...
896    One Piece: Episode of Sabo - 3 Kyoudai no Kizu...
6                                 Hunter x Hunter (2011)
Name: name, dtype: object

Threshold = 0.4

#Interview Questions:


1. Can you explain the difference between user-based and item-based collaborative filtering?

User-based and item-based collaborative filtering are two common ways of making recommendations, and the main idea behind both is similarity. In user-based collaborative filtering, the system looks for people who have similar tastes or rating patterns to you. Once it finds users who behave like you, it recommends items they liked but you haven't seen yet. Item-based collaborative filtering works the other way around: instead of looking for similar users, it finds items that are similar to the ones you already enjoy. So if you watched an anime like Naruto, the system will look for other anime that share similar ratings or viewing patterns, such as Bleach or One Piece. The big difference is that user-based filtering focuses on relationships between people, while item-based filtering focuses on relationships between products. In practice, item-based filtering is often more stable because items don’t change as frequently as user behavior, making it faster and more reliable for large systems.

2. What is collaborative filtering and how does it work?

Collaborative filtering is a recommendation approach that works by learning from the experiences and preferences of many users. Instead of relying on detailed information about items, it focuses on the idea that people who behaved similarly in the past will behave similarly in the future. For example, if two users have rated many of the same anime highly, the system assumes their tastes are alike. Based on this, it recommends anime that one user has enjoyed to the other user who hasn’t watched them yet. The method works by building similarity patterns—either between users or between items—and using those patterns to predict what someone may like. It’s popular because it doesn’t need deep knowledge about the content itself; it simply uses the wisdom of the crowd to make personalized suggestions.