# Recommendation System

### 1. Data Preprocessing

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv('anime.csv')
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [4]:
# exploring the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [5]:
df.columns

Index(['anime_id', 'name', 'genre', 'type', 'episodes', 'rating', 'members'], dtype='object')

In [6]:
#checking for missing values
df.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,62
type,25
episodes,0
rating,230
members,0


In [8]:
# Handling Missing Values
df['rating'] = df['rating'].fillna(df['rating'].mean())

In [13]:
df['genre'].fillna('Unknown', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['genre'].fillna('Unknown', inplace=True)


In [10]:
df['episodes']= df['episodes'].replace('Unknown', 0).astype(int)

In [15]:
df['type'].fillna('Unknown', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['type'].fillna('Unknown', inplace=True)


In [16]:
df.describe()

Unnamed: 0,anime_id,episodes,rating,members
count,12294.0,12294.0,12294.0,12294.0
mean,14058.221653,12.040101,6.473902,18071.34
std,11455.294701,46.257299,1.017096,54820.68
min,1.0,0.0,1.67,5.0
25%,3484.25,1.0,5.9,225.0
50%,10260.5,2.0,6.55,1550.0
75%,24794.5,12.0,7.17,9437.0
max,34527.0,1818.0,10.0,1013917.0


In [17]:
df.isnull().sum()

Unnamed: 0,0
anime_id,0
name,0
genre,0
type,0
episodes,0
rating,0
members,0


The dataset was loaded, explored, and preprocessed by handling missing values and ensuring numerical consistency for recommendation modeling.

### 2. Feature Extraction

To compute similarity between anime, the following features are selected:
* Genre → captures content similarity (most important)

* Rating → reflects user preference

* Members → indicates popularity

Because cosine similarity works best when both content-based (genre) and numerical preference-based (rating, popularity) features are combined.


In [19]:
#converting categorical features into numerical form
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english')
genre_tfidf = tfidf.fit_transform(df['genre'])

In [20]:
#Normalizing numerical features
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
num_features = scaler.fit_transform(df[['rating', 'members']])

In [21]:
# combining all features into a single feature matrix
from scipy.sparse import hstack

final_features = hstack([genre_tfidf, num_features ])

Genres were vectorized using TF-IDF, numerical features were normalized, and all features were combined to compute cosine similarity for recommendations.

### 3. Recommendation System

In [22]:
#computing cosine similarity matrix
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(final_features, final_features)

In [24]:
# creating a mapping from anime title to index
anime_indices = pd.Series(df.index, index=df['name']).drop_duplicates()

In [25]:
# Recommendation Function

# Recommends anime whose cosine similarity score is above a given threshold.

def recommend_anime_with_threshold(anime_title, threshold=0.5, max_n=20):

    idx = anime_indices[anime_title]
    sim_scores = list(enumerate(cosine_sim[idx]))

    filtered = [
        (i, score) for i, score in sim_scores
        if score >= threshold and i != idx
    ]

    # Sort by similarity
    filtered = sorted(filtered, key=lambda x: x[1], reverse=True)[:max_n]

    anime_idxs = [i[0] for i in filtered]

    return df.loc[anime_idxs, ["name", "genre", "rating"]]


In [27]:
# experimenting with different threshold values
recommend_anime_with_threshold("Naruto", threshold=0.7) # very strict recommendations


Unnamed: 0,name,genre,rating
615,Naruto: Shippuuden,"Action, Comedy, Martial Arts, Shounen, Super P...",7.94
86,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",8.54
582,Bleach,"Action, Comedy, Shounen, Super Power, Supernat...",7.95
40,Death Note,"Mystery, Police, Psychological, Supernatural, ...",8.71
804,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",7.83
159,Angel Beats!,"Action, Comedy, Drama, School, Supernatural",8.39
19,Code Geass: Hangyaku no Lelouch,"Action, Mecha, Military, School, Sci-Fi, Super...",8.83
445,Mirai Nikki (TV),"Action, Mystery, Psychological, Shounen, Super...",8.07
440,Soul Eater,"Action, Adventure, Comedy, Fantasy, Shounen, S...",8.08
643,Ao no Exorcist,"Action, Demons, Fantasy, Shounen, Supernatural",7.92


In [28]:
recommend_anime_with_threshold("Naruto", threshold=0.5) # moderate similarity

Unnamed: 0,name,genre,rating
615,Naruto: Shippuuden,"Action, Comedy, Martial Arts, Shounen, Super P...",7.94
86,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",8.54
582,Bleach,"Action, Comedy, Shounen, Super Power, Supernat...",7.95
40,Death Note,"Mystery, Police, Psychological, Supernatural, ...",8.71
804,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",7.83
159,Angel Beats!,"Action, Comedy, Drama, School, Supernatural",8.39
19,Code Geass: Hangyaku no Lelouch,"Action, Mecha, Military, School, Sci-Fi, Super...",8.83
445,Mirai Nikki (TV),"Action, Mystery, Psychological, Shounen, Super...",8.07
440,Soul Eater,"Action, Adventure, Comedy, Fantasy, Shounen, S...",8.08
643,Ao no Exorcist,"Action, Demons, Fantasy, Shounen, Supernatural",7.92


In [29]:
recommend_anime_with_threshold("Naruto", threshold=0.3) # broad recommendations

Unnamed: 0,name,genre,rating
615,Naruto: Shippuuden,"Action, Comedy, Martial Arts, Shounen, Super P...",7.94
86,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",8.54
582,Bleach,"Action, Comedy, Shounen, Super Power, Supernat...",7.95
40,Death Note,"Mystery, Police, Psychological, Supernatural, ...",8.71
804,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",7.83
159,Angel Beats!,"Action, Comedy, Drama, School, Supernatural",8.39
19,Code Geass: Hangyaku no Lelouch,"Action, Mecha, Military, School, Sci-Fi, Super...",8.83
445,Mirai Nikki (TV),"Action, Mystery, Psychological, Shounen, Super...",8.07
440,Soul Eater,"Action, Adventure, Comedy, Fantasy, Shounen, S...",8.08
643,Ao no Exorcist,"Action, Demons, Fantasy, Shounen, Supernatural",7.92


Analyzing the performance of recommendation system:
* Performance Analysis : The system provides relevant anime recommendations using cosine similarity and allows control over recommendation size through threshold adjustment.

* Areas of Improvement: The system can be improved by adding user personalization, more features (type, episodes), weighted similarity, and proper evaluation metrics.


### Interview Questions

1. Can you explain the difference between user-based and item-based collaborative filtering?

* User-based collaborative filtering finds users with similar preferences and recommends items they liked, while item-based collaborative filtering recommends items that are similar to those the user has already interacted with.


2. What is collaborative filtering, and how does it work?

* Collaborative filtering is a recommendation approach that uses user–item interaction data to identify patterns and recommend items based on similarities between users or items, without requiring item content information.