# Anime Recommendation System using Cosine Similarity

## Data Loading and Exploration

In [1]:

import pandas as pd

# Load dataset
df = pd.read_csv('anime.csv')
df.head()


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [2]:
# Dataset shape
df.shape

# Data types
df.dtypes

# Summary statistics
df.describe()


Unnamed: 0,anime_id,rating,members
count,12294.0,12064.0,12294.0
mean,14058.221653,6.473902,18071.34
std,11455.294701,1.026746,54820.68
min,1.0,1.67,5.0
25%,3484.25,5.88,225.0
50%,10260.5,6.57,1550.0
75%,24794.5,7.18,9437.0
max,34527.0,10.0,1013917.0


The dataset contains information about anime titles, genres, ratings, and popularity (members). The summary statistics show variation in ratings and member counts, which can influence similarity-based recommendations.

## Data Preprocessing

In [3]:

# Check missing values
df.isnull().sum()


Unnamed: 0,0
anime_id,0
name,0
genre,62
type,25
episodes,0
rating,230
members,0


In [4]:

# Fill missing values
df['genre'] = df['genre'].fillna('Unknown')
df['rating'] = df['rating'].fillna(df['rating'].mean())
df['members'] = df['members'].fillna(df['members'].mean())


## Feature Extraction

In [5]:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler
from scipy.sparse import hstack

# TF-IDF for genres
tfidf = TfidfVectorizer(stop_words='english')
genre_matrix = tfidf.fit_transform(df['genre'])

# Normalize numerical features
scaler = MinMaxScaler()
num_features = scaler.fit_transform(df[['rating', 'members']])

# Combine features
feature_matrix = hstack([genre_matrix, num_features])


## Recommendation System

In [6]:

from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(feature_matrix)

def recommend_with_threshold(title, threshold=0.3, top_n=5):
    if title not in df['name'].values:
        return "Anime not found"

    idx = df[df['name'] == title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Apply similarity threshold
    sim_scores = [s for s in sim_scores if s[1] >= threshold]
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top_n+1]

    anime_indices = [i[0] for i in sim_scores]
    return df[['name', 'genre', 'rating']].iloc[anime_indices]


## Example Recommendation

In [7]:
recommend_with_threshold(df['name'].iloc[0], threshold=0.4)


Unnamed: 0,name,genre,rating
5805,Wind: A Breath of Heart OVA,"Drama, Romance, School, Supernatural",6.35
6394,Wind: A Breath of Heart (TV),"Drama, Romance, School, Supernatural",6.14
1111,Aura: Maryuuin Kouga Saigo no Tatakai,"Comedy, Drama, Romance, School, Supernatural",7.67
878,Shakugan no Shana II (Second),"Action, Drama, Fantasy, Romance, School, Super...",7.79
1201,Angel Beats!: Another Epilogue,"Drama, School, Supernatural",7.63


Similarity thresholds were experimented with to control recommendation quality. Higher thresholds return fewer but more relevant recommendations, while lower thresholds increase diversity but may reduce relevance.

Performance Analysis:
The recommendation system uses cosine similarity, which efficiently measures similarity in high-dimensional feature space. TF-IDF captures genre relevance, while normalized numerical features ensure balanced influence. The system performs well for content-based recommendations, though it does not incorporate user feedback or collaborative filtering, which could further improve accuracy.

Interview Questions:
1. Can you explain the difference between user-based and item-based collaborative filtering?
User-based collaborative filtering recommends items by finding users with similar tastes, while item-based filtering suggests items similar to those a user has already liked. User-based is better for personalized, serendipitous recommendations, whereas item-based is more stable, computationally efficient, and handles "cold start" issues better, particularly in systems with more users than items. 

2. What is collaborative filtering, and how does it work?
Collaborative filtering is a, Recommender System method used to make personalized, automated recommendations by predicting a user's interests based on the preferences and behavior of similar users. It works by identifying, user-item similarities from large datasets and recommending items that similar users liked, often using techniques like matrix factorization, user-based or item-based approaches.

Conclusion:
The anime recommendation system successfully performs data preprocessing, feature extraction, and similarity-based recommendations. Additional data exploration and similarity threshold experimentation enhanced understanding of recommendation quality. While effective as a content-based system, future improvements could include collaborative filtering and user feedback integration.