# Anime Recommendation System



## This notebook demonstrates an anime recommendation engine using:

- **Content Based Filtering**
- **Text Vectorization**
- **Bag of Words**
- **Cosine_similarity**

In [3]:
import numpy as np 
import pandas as pd 
import os

In [27]:
import warnings
warnings.filterwarnings('ignore')

# Import data
data_m = pd.read_csv("data/anime-dataset-2022/Anime.csv")
pd.set_option("max_columns", 200)

In [28]:
data_m.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18495 entries, 0 to 18494
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Rank             18495 non-null  int64  
 1   Name             18495 non-null  object 
 2   Japanese_name    7938 non-null   object 
 3   Type             18495 non-null  object 
 4   Episodes         9501 non-null   float64
 5   Studio           12018 non-null  object 
 6   Release_season   4116 non-null   object 
 7   Tags             18095 non-null  object 
 8   Rating           15364 non-null  float64
 9   Release_year     18112 non-null  float64
 10  End_year         2854 non-null   float64
 11  Description      18491 non-null  object 
 13  Related_Mange    7627 non-null   object 
 14  Related_anime    10063 non-null  object 
 15  Voice_actors     15309 non-null  object 
 16  staff            13005 non-null  object 
dtypes: float64(4), int64(1), object(12)
memory usage: 2.4+ MB


In [29]:
data_m.head()

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.6,2021.0,,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.0,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.6,2021.0,,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.0,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.0,,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.0,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.0,2010.0,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.0,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.0,,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."


In [30]:
data_m.tail()

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
18490,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,,Web,2.0,Sparkly Key Animation Studio,,"Action, Ancient China, Chinese Animation, Hist...",,2020.0,,Special episodes of Qin Shi Mingyue: Canghai H...,,,Qin Shi Mingyue: Canghai Hengliu,,
18491,18492,Yi Tang Juchang: Sanguo Yanyi,,TV,108.0,,,Chinese Animation,,2010.0,,No synopsis yet - check back soon!,,,,,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,,TV,13.0,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,
18493,18494,Chengshi Jiyi Wo Men de Jieri,,TV,,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,
18494,18495,Heisei Inu Monogatari Bow: Genshi Inu Monogata...,,Movie,,Nippon Animation,,"Comedy, Slice of Life, Dogs",,1994.0,,No synopsis yet - check back soon!,,,Heisei Inu Monogatari Bow,,


Looking at the observations

In [31]:
data_m["Tags"][2]

'Fantasy, Ancient China, Chinese Animation, Cultivation, Xianxia, Based on a Web Novel'

In [32]:
data_m["Description"][2]

"'The third season of Mo Dao Zu Shi.'"

In [33]:
data_m["Content_Warning"][2]

nan

In [34]:
data_m["Related_Mange"][2]

'Grandmaster of Demonic Cultivation: Mo Dao Zu Shi (Novel), The Master of Diabolism'

In [35]:
data_m["staff"][0]

'Koyoharu Gotouge : Original Creator, Haruo Sotozaki : Director, Akira Matsushima : Character Design, Aimer : Song Performance'

# Filtering the columns for Model 

In [36]:
#### Essential featues to be included
# Data for ml algo
# Rank
# Name
# Type
# Tags
# Description
# Staff "Get Original Creator"

In [37]:
data_m = data_m[["Rank", "Name", "Type", "Tags", "Description", "staff"]]

In [38]:
# Count duplicates
data_m.duplicated().sum()

0

In [39]:
# Checking if any NaN exist
data_m.isnull().sum()

Rank              0
Name              0
Type              0
Tags            400
Description       4
staff          5490
dtype: int64

In [40]:
data_m["Type"].value_counts()

TV       5446
Movie    3577
Web      2488
OVA      2235
Music    2165
Other     990
DVD S     911
TV Sp     683
Name: Type, dtype: int64

# Data Cleaning

In [41]:
# Type variable has unnecessary space characters
data_m[data_m["Type"] == "TV   "]

Unnamed: 0,Rank,Name,Type,Tags,Description,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,TV,"Action, Adventure, Fantasy, Shounen, Demons, H...",'Tanjiro and his friends accompany the Hashira...,"Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,TV,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",'The final arc of Fruits Basket.',"Natsuki Takaya : Original Creator, Yoshihide I..."
3,4,Fullmetal Alchemist: Brotherhood,TV,"Action, Adventure, Drama, Fantasy, Mystery, Sh...","""The foundation of alchemy is based on the law...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,TV,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",'The battle to retake Wall Maria begins now! W...,"Hajime Isayama : Original Creator, Tetsurou Ar..."
5,6,Jujutsu Kaisen,TV,"Action, Horror, Shounen, Curse, Exorcists, Mon...",'Although Yuji Itadori looks like your average...,"Gege Akutami : Original Creator, Seong-Hu Park..."
...,...,...,...,...,...,...
18481,18482,Goblin Slayer 2,TV,"Action, Adventure, Fantasy, Seinen, Dark Fanta...","It's fall, and the village's harvest festival ...",Kumo Kagyu : Original Creator
18488,18489,Shachiku-san wa Youjo Yuurei ni Iyasaretai.,TV,"Comedy, Slice of Life, Ghosts, Iyashikei, Non-...","The story follows the daily life of Fushihara,...",Imari Arita : Original Creator
18491,18492,Yi Tang Juchang: Sanguo Yanyi,TV,Chinese Animation,No synopsis yet - check back soon!,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,TV,"Chinese Animation, Family Friendly, Short Epis...",No synopsis yet - check back soon!,


In [42]:
# Filtering the type column by removing spaces and putting it back into the dataframe
temp = data_m["Type"].apply(lambda x: [" ".join(x.split()) for i in x])

In [43]:
def type_(input):
    L = []
    for i in input:
        L.append(i[0])
    return L

In [44]:
data_m["Type"]= pd.Series(type_(temp))

In [45]:
# unnecessary space characters are removed
data_m[data_m["Type"] == "TV"]

Unnamed: 0,Rank,Name,Type,Tags,Description,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,TV,"Action, Adventure, Fantasy, Shounen, Demons, H...",'Tanjiro and his friends accompany the Hashira...,"Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,TV,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",'The final arc of Fruits Basket.',"Natsuki Takaya : Original Creator, Yoshihide I..."
3,4,Fullmetal Alchemist: Brotherhood,TV,"Action, Adventure, Drama, Fantasy, Mystery, Sh...","""The foundation of alchemy is based on the law...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,TV,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",'The battle to retake Wall Maria begins now! W...,"Hajime Isayama : Original Creator, Tetsurou Ar..."
5,6,Jujutsu Kaisen,TV,"Action, Horror, Shounen, Curse, Exorcists, Mon...",'Although Yuji Itadori looks like your average...,"Gege Akutami : Original Creator, Seong-Hu Park..."
...,...,...,...,...,...,...
18481,18482,Goblin Slayer 2,TV,"Action, Adventure, Fantasy, Seinen, Dark Fanta...","It's fall, and the village's harvest festival ...",Kumo Kagyu : Original Creator
18488,18489,Shachiku-san wa Youjo Yuurei ni Iyasaretai.,TV,"Comedy, Slice of Life, Ghosts, Iyashikei, Non-...","The story follows the daily life of Fushihara,...",Imari Arita : Original Creator
18491,18492,Yi Tang Juchang: Sanguo Yanyi,TV,Chinese Animation,No synopsis yet - check back soon!,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,TV,"Chinese Animation, Family Friendly, Short Epis...",No synopsis yet - check back soon!,


In [46]:
# Basically observations with no tags and useless Description
# We will drop these from data_m
filtering_1 = data_m[ (data_m["Description"] == "'No synopsis yet - check back soon!'") & (data_m["Tags"].isnull())].index
data_m = data_m.drop(filtering_1)

In [47]:
data_m.isnull().sum()

Rank              0
Name              0
Type              0
Tags            267
Description       4
staff          5451
dtype: int64

In [48]:
# Let us drop the observations which are of type Music, OVA , TV Sp. Since they are mostly a filler anime
filtering_2 = data_m[ (data_m["Type"] == "Music") | (data_m["Type"] == "OVA") | (data_m["Type"] == "TV Sp")].index
data_m.drop(filtering_2,inplace = True)

In [49]:
data_m.isnull().sum()

Rank              0
Name              0
Type              0
Tags             31
Description       4
staff          4322
dtype: int64

In [50]:
filtering_3 = data_m[ (data_m["Description"] == "No synopsis yet - check back soon!") & (data_m["Tags"].isnull())].index
data_m.drop(filtering_3,inplace=True)

In [51]:
data_m[data_m["Tags"].isnull()]

Unnamed: 0,Rank,Name,Type,Tags,Description,staff
7910,7911,Himitsu no Akko-chan Movie,Movie,,"""Taisho's father plans to construct an apartme...","Fujio Akatsuka : Original Creator, Hiroki Shib..."
8325,8326,Himitsu no Akko-chan: Umi da! Obake da!! Natsu...,Movie,,"""Akko spends a summer with her friends at her ...","Fujio Akatsuka : Original Creator, Hiroki Shib..."
8341,8342,Nanaka 6/17: Ojamana Nanaka,DVD S,,'Unaired episode included in the DVD release o...,
10185,10186,Tobe! Kujira no Peek,Movie,,'The story of an albino whale used in a circus...,"Kouji Morimoto : Director, Satoru Utsunomiya :..."


In [52]:
# Drop the items with null tags
filtering_4 = data_m[data_m["Tags"].isnull()].index
data_m.drop(filtering_4,inplace=True)

In [53]:
data_m.isnull().sum()

Rank              0
Name              0
Type              0
Tags              0
Description       4
staff          4300
dtype: int64

In [54]:
data_m.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13356 entries, 0 to 18494
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Rank         13356 non-null  int64 
 1   Name         13356 non-null  object
 2   Type         13356 non-null  object
 3   Tags         13356 non-null  object
 4   Description  13352 non-null  object
 5   staff        9056 non-null   object
dtypes: int64(1), object(5)
memory usage: 730.4+ KB


In [55]:
# Not all the observations have descriptions
# Most of them are 'No synopsis yet - check back soon!'   OR
# No synopsis yet - check back soon!    as the description.

data_m.drop("Description", axis=1, inplace=True)

In [56]:
data_m.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13356 entries, 0 to 18494
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Rank    13356 non-null  int64 
 1   Name    13356 non-null  object
 2   Type    13356 non-null  object
 3   Tags    13356 non-null  object
 4   staff   9056 non-null   object
dtypes: int64(1), object(4)
memory usage: 626.1+ KB


Checking the staff variable.
- We have noticed that there are 5490 missing values. We could get rid of the column entirely.
- However in the world of Anime. Original Creaters like Eiichiro Oda (One Piece), Hajime Isayama (Attack on Titan), Masashi Kishimoto (Naruto), many more have a great Manga sales throughout the world.

Why consider sales based on Original Creater's name?
Creator's name is preferred when looking for a new anime to watch. Creators leave their mark on anime, and popular ones are greatest masterpieces.

**Therefore, we'll keep the observations with staff variable that have Original Creater Name.**

In [57]:
data_m["staff"].isna().sum()

4300

# Filtering of Staff variable

In [58]:
# Filtering out the values that has staff
data_m_mini = data_m[data_m["staff"].notnull()]

In [59]:
data_m_mini.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9056 entries, 0 to 18489
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Rank    9056 non-null   int64 
 1   Name    9056 non-null   object
 2   Type    9056 non-null   object
 3   Tags    9056 non-null   object
 4   staff   9056 non-null   object
dtypes: int64(1), object(4)
memory usage: 424.5+ KB


In [60]:
data_m_mini["staff"][0]

'Koyoharu Gotouge : Original Creator, Haruo Sotozaki : Director, Akira Matsushima : Character Design, Aimer : Song Performance'

In [61]:
# We need to convert these to dict
data_m_mini["staff"][1]

'Natsuki Takaya : Original Creator, Yoshihide Ibata : Director & Episode Director & Storyboard, Taku Kishimoto : Screenplay & Series Composition, Masaru Yokoyama : Music, Masaru Shindou : Character Design & Chief Animation Director, Baek-Ryun Chae : Photography Director, Youko Koyama : Art Director, Mika Sugawara : Color Design'

In [62]:
data_m_mini

Unnamed: 0,Rank,Name,Type,Tags,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,TV,"Action, Adventure, Fantasy, Shounen, Demons, H...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,TV,"Drama, Fantasy, Romance, Shoujo, Animal Transf...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,Web,"Fantasy, Ancient China, Chinese Animation, Cul...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,TV,"Action, Adventure, Drama, Fantasy, Mystery, Sh...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,TV,"Action, Fantasy, Horror, Shounen, Dark Fantasy...","Hajime Isayama : Original Creator, Tetsurou Ar..."
...,...,...,...,...,...
18484,18485,Tooriame,Movie,"Abstract, Shorts","Tomoko Takayama : Director, Juri Ito : Music, ..."
18486,18487,Better back then,Movie,"Shorts, Stop Motion Animation, Original Work","Naoaki Shibuta : Director, Chikara Uemizutaru ..."
18487,18488,Make My Day,Web,"Horror, Monsters, Original Work","Yasuo Ootagaki : Original Creator, Makoto Kane..."
18488,18489,Shachiku-san wa Youjo Yuurei ni Iyasaretai.,TV,"Comedy, Slice of Life, Ghosts, Iyashikei, Non-...",Imari Arita : Original Creator


In [63]:
# Resetting index for the staff, converting from a Series str to a Series of dict
data_m_mini.reset_index(inplace = True)

In [64]:
# Testing one dataset value for staff
# Convert to dict
try_1 = data_m_mini["staff"][1].split(",")
try_1 = {k.split(":")[0].strip() : k.split(":")[1].strip() for k in try_1}
print(try_1)

{'Natsuki Takaya': 'Original Creator', 'Yoshihide Ibata': 'Director & Episode Director & Storyboard', 'Taku Kishimoto': 'Screenplay & Series Composition', 'Masaru Yokoyama': 'Music', 'Masaru Shindou': 'Character Design & Chief Animation Director', 'Baek-Ryun Chae': 'Photography Director', 'Youko Koyama': 'Art Director', 'Mika Sugawara': 'Color Design'}


In [65]:
#  Conveting all from str to Dict
for i in range (0,len(data_m_mini["staff"])):
    try_1 = data_m_mini["staff"][i].split(",")
    data_m_mini["staff"][i] = {k.split(":")[0].strip() : k.split(":")[1].strip() for k in try_1}

In [66]:
data_m_mini["staff"][1]

{'Natsuki Takaya': 'Original Creator',
 'Yoshihide Ibata': 'Director & Episode Director & Storyboard',
 'Taku Kishimoto': 'Screenplay & Series Composition',
 'Masaru Yokoyama': 'Music',
 'Masaru Shindou': 'Character Design & Chief Animation Director',
 'Baek-Ryun Chae': 'Photography Director',
 'Youko Koyama': 'Art Director',
 'Mika Sugawara': 'Color Design'}

In [67]:
data_m_mini["staff"][1].keys()

dict_keys(['Natsuki Takaya', 'Yoshihide Ibata', 'Taku Kishimoto', 'Masaru Yokoyama', 'Masaru Shindou', 'Baek-Ryun Chae', 'Youko Koyama', 'Mika Sugawara'])

In [68]:
data_m_mini["staff"][1].values()

dict_values(['Original Creator', 'Director & Episode Director & Storyboard', 'Screenplay & Series Composition', 'Music', 'Character Design & Chief Animation Director', 'Photography Director', 'Art Director', 'Color Design'])

In [69]:
# Converting values to keys and keys to values in Dict
for i in range (0,len(data_m_mini["staff"])):
    dict = data_m_mini["staff"][i]
    data_m_mini["staff"][i] = {value:key for key, value in dict.items()}

In [70]:
data_m_mini.iloc[8000:8023,3:4]

Unnamed: 0,Type
8000,Movie
8001,TV
8002,Movie
8003,TV
8004,Web
8005,Web
8006,TV
8007,Movie
8008,TV
8009,TV


In [71]:
# Filtering for Original Creators in staff Column
for i in range (0,len(data_m_mini["staff"])):
    data_m_mini["staff"][i] = { key:value for (key,value) in data_m_mini["staff"][i].items() if key  == "Original Creator"}

In [72]:
data_m_mini.iloc[1:40,3:4]

Unnamed: 0,Type
1,TV
2,Web
3,TV
4,TV
5,TV
6,TV
7,TV
8,Movie
9,TV
10,Movie


In [73]:
# filtering for dict with values
filtering_5 = data_m_mini[data_m_mini["staff"] == {}].index
data_m_mini.drop(filtering_5,inplace = True)

In [74]:
# Reset index aagin
data_m_mini.reset_index(inplace = True)

In [75]:
# Keeping the values of Original Creator
for i in range (0,len(data_m_mini["staff"])):
        data_m_mini["staff"][i] = (data_m_mini["staff"][i])["Original Creator"]

*We have succesfully extracted the Original Creater Name.* 

In [76]:
data_m_mini

Unnamed: 0,level_0,index,Rank,Name,Type,Tags,staff
0,0,0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,TV,"Action, Adventure, Fantasy, Shounen, Demons, H...",Koyoharu Gotouge
1,1,1,2,Fruits Basket the Final Season,TV,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",Natsuki Takaya
2,2,2,3,Mo Dao Zu Shi 3,Web,"Fantasy, Ancient China, Chinese Animation, Cul...",Mo Xiang Tong Xiu
3,3,3,4,Fullmetal Alchemist: Brotherhood,TV,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",Hiromu Arakawa
4,4,4,5,Attack on Titan 3rd Season: Part II,TV,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",Hajime Isayama
...,...,...,...,...,...,...,...
3194,9037,18443,18444,The Imaginary,Movie,Based on a Novel,A. F. Harrold
3195,9049,18481,18482,Goblin Slayer 2,TV,"Action, Adventure, Fantasy, Seinen, Dark Fanta...",Kumo Kagyu
3196,9050,18483,18484,Peleliu: Rakuen no Guernica,Other,"Action, Shounen, Island, War, World War 2, Bas...",Masao Hiratsuka
3197,9053,18487,18488,Make My Day,Web,"Horror, Monsters, Original Work",Yasuo Ootagaki


In [77]:
data_m_mini["Tags"] = data_m_mini["Tags"] +", "+ data_m_mini["staff"]

In [78]:
data_m_mini.drop(labels = "staff",axis = 1, inplace= True)

In [79]:
data_m.drop("staff",axis = 1, inplace = True)

In [80]:
data_m["Tags"][0]

'Action, Adventure, Fantasy, Shounen, Demons, Historical, Martial Arts, Orphans, Siblings, Swordplay, Based on a Manga, Explicit Violence'

In [81]:
data_m_mini["Tags"][0]

'Action, Adventure, Fantasy, Shounen, Demons, Historical, Martial Arts, Orphans, Siblings, Swordplay, Based on a Manga, Explicit Violence, Koyoharu Gotouge'

In [82]:
data_m.set_index(['Rank','Name'], inplace=True)
data_m.update(data_m_mini.set_index(['Rank','Name']))
data_m.reset_index(inplace=True)
data_m

Unnamed: 0,Rank,Name,Type,Tags
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,TV,"Action, Adventure, Fantasy, Shounen, Demons, H..."
1,2,Fruits Basket the Final Season,TV,"Drama, Fantasy, Romance, Shoujo, Animal Transf..."
2,3,Mo Dao Zu Shi 3,Web,"Fantasy, Ancient China, Chinese Animation, Cul..."
3,4,Fullmetal Alchemist: Brotherhood,TV,"Action, Adventure, Drama, Fantasy, Mystery, Sh..."
4,5,Attack on Titan 3rd Season: Part II,TV,"Action, Fantasy, Horror, Shounen, Dark Fantasy..."
...,...,...,...,...
13351,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,Web,"Action, Ancient China, Chinese Animation, Hist..."
13352,18492,Yi Tang Juchang: Sanguo Yanyi,TV,Chinese Animation
13353,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,TV,"Chinese Animation, Family Friendly, Short Epis..."
13354,18494,Chengshi Jiyi Wo Men de Jieri,TV,"Chinese Animation, Family Friendly, Short Epis..."


In [83]:
for i in range(0,len(data_m["Tags"])):
    data_m["Tags"][i] = data_m["Tags"][i].replace(" ","")

In [84]:
for i in range(0,len(data_m["Tags"])):
    data_m["Tags"][i] = data_m["Tags"][i].replace(","," ")

In [85]:
data_m["Tags"] = data_m["Tags"].apply(lambda x: x.lower())

**Next let us make a recommendation model**

In [86]:
data = data_m.copy()

In [87]:
data

Unnamed: 0,Rank,Name,Type,Tags
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,TV,action adventure fantasy shounen demons histor...
1,2,Fruits Basket the Final Season,TV,drama fantasy romance shoujo animaltransformat...
2,3,Mo Dao Zu Shi 3,Web,fantasy ancientchina chineseanimation cultivat...
3,4,Fullmetal Alchemist: Brotherhood,TV,action adventure drama fantasy mystery shounen...
4,5,Attack on Titan 3rd Season: Part II,TV,action fantasy horror shounen darkfantasy isol...
...,...,...,...,...
13351,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,Web,action ancientchina chineseanimation historica...
13352,18492,Yi Tang Juchang: Sanguo Yanyi,TV,chineseanimation
13353,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,TV,chineseanimation familyfriendly shortepisodes
13354,18494,Chengshi Jiyi Wo Men de Jieri,TV,chineseanimation familyfriendly shortepisodes


### Text Vectarization
### Bag of words

In [88]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 13500,stop_words="english")

In [89]:
cv.fit_transform(data["Tags"]).toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [90]:
cv.fit_transform(data["Tags"]).toarray().shape

(13356, 1931)

In [91]:
vectors = cv.fit_transform(data["Tags"]).toarray()

In [92]:
cv.get_feature_names()[0:10]

['15thcentury',
 '16thcentury',
 '17thcentury',
 '18thcentury',
 '19thcentury',
 'abiumeda',
 'abstract',
 'acomomota',
 'acting',
 'action']

In [93]:
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(vectors)
similarity

array([[1.        , 0.19611614, 0.10482848, ..., 0.        , 0.        ,
        0.        ],
       [0.19611614, 1.        , 0.08908708, ..., 0.        , 0.        ,
        0.        ],
       [0.10482848, 0.08908708, 1.        , ..., 0.21821789, 0.21821789,
        0.        ],
       ...,
       [0.        , 0.        , 0.21821789, ..., 1.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.21821789, ..., 1.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

In [94]:
similarity.shape

(13356, 13356)

In [95]:
def Recommender(user_input):
    Index_of_anime = data[data["Name"] == user_input].index[0]
    Similarity_score = similarity[Index_of_anime]
    Sorted_scores = sorted(list(enumerate(Similarity_score)),reverse = True,key= lambda x: x[1]) [1:6]
    Recommended_Anime = []
    
    for i in Sorted_scores:
        Recommended_Anime.append(data.iloc[i[0]].Name)
    return Recommended_Anime

In [96]:
Recommender("Jujutsu Kaisen")

['Jujutsu Kaisen 0',
 'Ga-Rei-Zero',
 "Hell's Paradise: Jigokuraku PV",
 'Corpse Princess: Kuro',
 "JoJo's Bizarre Adventure: Phantom Blood"]

In [98]:
Recommender("Attack on Titan 3rd Season: Part II")

['Attack on Titan 2nd Season',
 'Attack on Titan 3rd Season',
 'Attack on Titan Movie 3: The Roar of Awakening',
 'Attack on Titan Movie 2: The Wings of Freedom',
 'Attack on Titan Movie 1: Crimson Bow and Arrow']

In [99]:
tv_data = data_m[data_m["Type"] == "TV"]

In [104]:
tv_data = tv_data.sample(frac=1)

In [105]:
tv_data.head(10)

Unnamed: 0,Rank,Name,Type,Tags
839,985,New Game!!,TV,comedy seinen coworkers iyashikei videogameind...
6746,9084,Digimon Savers,TV,action adventure fantasy shounen familyfriendl...
1702,2041,Gokusen,TV,comedy josei all-boysschool delinquents gangs ...
2034,2469,Fafner Exodus 2nd Season,TV,action mecha scifi aliens animeism post-apocal...
9445,13489,Anisava,TV,comedy romance sliceoflife animalprotagonists ...
130,139,Dororo (2019),TV,action adventure shounen 15thcentury curse dar...
860,1009,Restaurant to Another World 2,TV,fantasy episodic foodandbeverage iyashikei non...
10708,15390,Lata Dawang Qiyu Ji,TV,animalprotagonists anthropomorphic chineseanim...
1246,1473,Persona 4 the Animation,TV,action fantasy mystery contemporaryfantasy sup...
6378,8532,Occultic;Nine,TV,mystery scifi socialmedia basedonalightnovel c...


In [101]:
Recommender('Haikyuu!! Second Season')

['Haikyuu!!',
 'Haikyuu!! Karasuno High School vs Shiratorizawa Academy',
 'Haikyuu!! Movie 3: Talent and Sense',
 'Haikyuu!! Movie 2: Shousha to Haisha',
 'Haikyuu!! Movie 1: Owari to Hajimari']

In [106]:
Recommender('Dororo (2019)')

['Dororo',
 'Orient',
 "JoJo's Bizarre Adventure (2012)",
 'Samurai Deeper Kyo',
 "JoJo's Bizarre Adventure: Stardust Crusaders"]