# Recommendation System


### Data Preprocessing:

Load the dataset into a suitable data structure (e.g., pandas DataFrame).
Handle missing values, if any.
Explore the dataset to understand its structure and attributes.

In [1]:
import pandas as pd
df=pd.read_csv("anime.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [2]:
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [3]:
df.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [5]:
# Handle missing values
df['genre'] = df['genre'].fillna('Unknown')
df['type'] = df['type'].fillna('Unknown')
df['rating'] = df['rating'].fillna(df['rating'].mean())

print(df['genre'].value_counts().head())


genre
Hentai                   823
Comedy                   523
Music                    301
Kids                     199
Comedy, Slice of Life    179
Name: count, dtype: int64


### Feature Extraction:

Decide on the features that will be used for computing similarity (e.g., genres, user ratings).
Convert categorical features into numerical representations if necessary.
Normalize numerical features if required.

1. Decide on Features to Use for Similarity
For content-based recommendations with your dataset:

Use genre (categorical, multi-label)

Use rating (numeric)

Optionally use type, episodes, and members for extra context

In [7]:
from sklearn.feature_extraction.text import CountVectorizer

# Use genre as a categorical feature
vectorizer = CountVectorizer(token_pattern='[^,]+')
genre_features = vectorizer.fit_transform(df['genre'].fillna('Unknown'))
# Each column in genre_features is a genre; each row indicates presence/absence for an anime
genre_features

<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 36346 stored elements and shape (12294, 83)>

In [8]:
type_dummies = pd.get_dummies(df['type'].fillna('Unknown'))
# type_dummies is a DataFrame with columns for each type (e.g., TV, OVA, Movie)


In [9]:
from sklearn.preprocessing import MinMaxScaler

# Fix episodes with 'Unknown' or non-integer values
df['episodes'] = pd.to_numeric(df['episodes'].replace('Unknown', '0'), errors='coerce').fillna(0)
num_features = df[['rating', 'episodes', 'members']].fillna(df[['rating', 'episodes', 'members']].mean())

scaler = MinMaxScaler()
normalized_num_features = scaler.fit_transform(num_features)
# normalized_num_features is a scaled array (all values between 0 and 1)


### Recommendation System:

Design a function to recommend anime based on cosine similarity.
Given a target anime, recommend a list of similar anime based on cosine similarity scores.
Experiment with different threshold values for similarity scores to adjust the recommendation list size.
Analyze the performance of the recommendation system and identify areas of improvement

In [14]:
from sklearn.metrics.pairwise import cosine_similarity

# Assume `anime_features` is your final feature matrix and `df` is your anime dataframe
cosine_sim = cosine_similarity(normalized_num_features)

def recommend_anime(anime_title, top_n=5, threshold=0.3):
    # Find index of the target anime (case-insensitive search)
    idxs = df[df['name'].str.lower() == anime_title.lower()].index
    if len(idxs) == 0:
        return f"Anime '{anime_title}' not found."
    idx = idxs[0]
    
    # Compute similarity scores to all other anime
    sim_scores = list(enumerate(cosine_sim[idx]))
    # Remove self and apply similarity threshold
    sim_scores = [(i, score) for i, score in sim_scores if i != idx and score >= threshold]
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # Get indices for top N recommendations
    top_indices = [i for i, score in sim_scores[:top_n]]
    results = df.iloc[top_indices][['anime_id', 'name', 'genre', 'rating', 'members']]
    return results

# Example usage
recommendations = recommend_anime('Under World', top_n=5, threshold=0.2)
print(recommendations)


       anime_id                                  name  \
9330      28151              Kujakuou: Sengoku Tensei   
12287      9352                Tenshi no Habataki Jun   
4271       5002                    Bari Bari Densetsu   
7971      29973  Aya Hito Shiki to Iu na no Ishi Hata   
9806      28143                 Ochou Fujin no Gensou   

                                          genre  rating  members  
9330   Action, Demons, Historical, Supernatural    4.33      181  
12287                                    Hentai    4.33      201  
4271             Action, Drama, Shounen, Sports    6.76      385  
7971                                   Dementia    4.38      177  
9806                          Drama, Historical    4.24      156  


In [15]:
for t in [0.1, 0.2, 0.3, 0.5]:
    print(f"\nThreshold: {t}")
    rec = recommend_anime('Naruto', top_n=10, threshold=t)
    print(f"Found {len(rec)} recommendations")
    print(rec[['name', 'genre', 'rating']].head(3))  # Preview top 3



Threshold: 0.1
Found 10 recommendations
                                 name  \
1    Fullmetal Alchemist: Brotherhood   
582                            Bleach   
288                        Fairy Tail   

                                                 genre  rating  
1    Action, Adventure, Drama, Fantasy, Magic, Mili...    9.26  
582  Action, Comedy, Shounen, Super Power, Supernat...    7.95  
288  Action, Adventure, Comedy, Fantasy, Magic, Sho...    8.22  

Threshold: 0.2
Found 10 recommendations
                                 name  \
1    Fullmetal Alchemist: Brotherhood   
582                            Bleach   
288                        Fairy Tail   

                                                 genre  rating  
1    Action, Adventure, Drama, Fantasy, Magic, Mili...    9.26  
582  Action, Comedy, Shounen, Super Power, Supernat...    7.95  
288  Action, Adventure, Comedy, Fantasy, Magic, Sho...    8.22  

Threshold: 0.3
Found 10 recommendations
                           

---Analyzing Performance and Improvement Areas

If your recommendations are too generic, add more features (like type or number of episodes).

If there are duplicates, ensure titles are unique or handle them by anime ID.

You may also use user feedback or known anime groupings to tune the threshold or features.

### Interview Questions:
1. Can you explain the difference between user-based and item-based collaborative filtering?
  User-based collaborative filtering recommends items to a user by finding other users who have similar tastes or behaviors, and then suggesting items those similar users have liked. For example, if you and another person both liked several of the same anime, the system recommends shows that they liked but you haven't watched yet.

Item-based collaborative filtering, on the other hand, looks at the items themselves. It finds items that are similar to ones you already like and recommends those. So, if you liked "Naruto," the system finds anime similar to "Naruto" based on lots of ratings, then recommends those to you.

2. What is collaborative filtering, and how does it work?
   Collaborative filtering is a way to recommend things (like anime or movies) based on the preferences of many users. The system collects likes, ratings, or viewing habits from lots of people, then predicts what you might enjoy by finding patterns among similar users or similar items. It doesn’t need to know anything about the actual content—just what people liked—and uses that information to suggest new things for you to try.