# Otaku Recommender – AI-Based Anime Recommendation System

## Objective
The objective of this project is to design and implement an AI-powered recommendation system that suggests anime titles based on textual similarity using Natural Language Processing techniques.


## Problem Statement
With the rapid growth of anime content, users often struggle to discover shows that match their interests. Traditional genre-based filtering is limited and does not capture semantic meaning. This project addresses the problem using content-based recommendation techniques.


    

In [1]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


## Dataset Description
The dataset used in this project contains anime-related information such as titles, genres, and descriptions. This data is used to compute similarity between anime titles for recommendation.


In [2]:
df = pd.read_csv("data/anime.csv")
df.head()


Unnamed: 0,item_id,title,genres,description,image_url
0,16498,Shingeki no Kyojin,Action|Award Winning|Drama|Suspense|Gore|Milit...,"Centuries ago, mankind was slaughtered to near...",https://cdn.myanimelist.net/images/anime/10/47...
1,1535,Death Note,Supernatural|Suspense|Psychological|Shounen,"Brutal murders, petty thefts, and senseless vi...",https://cdn.myanimelist.net/images/anime/1079/...
2,5114,Fullmetal Alchemist: Brotherhood,Action|Adventure|Drama|Fantasy|Military|Shounen,After a horrific alchemy experiment goes wrong...,https://cdn.myanimelist.net/images/anime/1208/...
3,30276,One Punch Man,Action|Comedy|Adult Cast|Parody|Super Power|Se...,The seemingly unimpressive Saitama has a rathe...,https://cdn.myanimelist.net/images/anime/12/76...
4,38000,Kimetsu no Yaiba,Action|Award Winning|Supernatural|Historical|S...,"Ever since the death of his father, the burden...",https://cdn.myanimelist.net/images/anime/1286/...


In [3]:
df.columns


Index(['item_id', 'title', 'genres', 'description', 'image_url'], dtype='object')

In [4]:
df.isnull().sum()


item_id        0
title          0
genres         0
description    2
image_url      0
dtype: int64

In [5]:
df['description'] = df['description'].fillna('')


In [6]:
df['combined_text'] = (
    df['title'] + ' ' +
    df['genres'] + ' ' +
    df['description']
)


In [7]:
df[['title', 'combined_text']].head(2)


Unnamed: 0,title,combined_text
0,Shingeki no Kyojin,Shingeki no Kyojin Action|Award Winning|Drama|...
1,Death Note,Death Note Supernatural|Suspense|Psychological...


In [8]:
tfidf = TfidfVectorizer(
    stop_words='english',
    max_features=5000
)

tfidf_matrix = tfidf.fit_transform(df['combined_text'])


In [9]:
tfidf_matrix.shape


(2999, 5000)

## Model / System Design

This project follows a content-based recommendation approach. Textual information from anime titles, genres, and descriptions is transformed into numerical vectors using TF-IDF (Term Frequency–Inverse Document Frequency). Cosine similarity is then used to compute similarity scores between anime items.

This design was chosen because it is interpretable, efficient, and well-suited for text-based recommendation tasks.


In [10]:
indices = pd.Series(df.index, index=df['title']).drop_duplicates()

def recommend_anime(title, num_recommendations=5):
    if title not in indices:
        return "Title not found in dataset."
    
    idx = indices[title]
    sim_scores = list(enumerate(cosine_similarity(tfidf_matrix[idx], tfidf_matrix)[0]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:num_recommendations+1]
    
    anime_indices = [i[0] for i in sim_scores]
    return df[['title', 'genres']].iloc[anime_indices]


In [11]:
recommend_anime("Death Note")


Unnamed: 0,title,genres
1229,Death Note: Rewrite,Supernatural|Suspense|Psychological|Shounen
50,Death Parade,Drama|Fantasy|Suspense|Adult Cast|High Stakes ...
2569,Shinjiteita Nakama-tachi ni Dungeon Okuchi de ...,Action|Fantasy
2341,Kami wa Game ni Ueteiru.,Fantasy|Suspense|High Stakes Game|Strategy Game
63,Soul Eater,Action|Comedy|Fantasy|School|Shounen


In [12]:
def recommend_by_text(query, num_recommendations=5):
    query_vec = tfidf.transform([query])
    similarity_scores = cosine_similarity(query_vec, tfidf_matrix).flatten()
    top_indices = similarity_scores.argsort()[::-1][:num_recommendations]
    return df[['title', 'genres']].iloc[top_indices]


In [13]:
recommend_by_text("dark psychological thriller anime")


Unnamed: 0,title,genres
2652,Animegataris,Comedy|Otaku Culture|Parody|School
2486,5-toubun no Hanayome∽,Comedy|Romance|Harem|School|Shounen
2823,Nogizaka Haruka no Himitsu Purezza ♪,Comedy|Romance|Otaku Culture|School
2798,Gintama: Shiroyasha Koutan,Action|Comedy|Sci-Fi|Historical|Parody
1426,Tengen Toppa Gurren Lagann Movie 1: Gurren-hen,Sci-Fi|Mecha


## Evaluation & Analysis

The recommendation system is evaluated qualitatively by examining the relevance of suggested anime based on genre and thematic similarity. Sample outputs indicate that the system successfully recommends contextually similar titles.

Limitations include dependence on textual metadata and lack of explicit user preference feedback.


## Ethical Considerations & Responsible AI

The dataset may exhibit popularity bias, favoring well-known anime over niche titles. No personal user data is collected or stored. The system is content-based and avoids sensitive or demographic profiling.


## Conclusion & Future Scope

This project demonstrates a complete AI-based anime recommendation system using NLP techniques. Future enhancements include embedding-based models, user feedback integration, and real-time personalization.
