## Description

For creating a Anime recommendation model, we need to specifically know about these concepts:

1. Feature Extraction: The mapping from textual data to real valued vectors is called feature extraction.
2. Bag Of Words(BOW): It is the list of unique woreds in the text corpus.
3. **TfidfVectorizer()**-Term Frequency Inverse Document Frequency: This function helps us to convert the text into vectors by counting the number of times each word appeared in a document.

  TF-IDF value of a term = TF x IDF
4. Term frequency(TF) = (Number of times term 't' appears in a document)/(Number of terms in the document)
5. Inverse Document Frequency(IDF) = log(N/n), where, 'N' is the number of documents and 'n' is the number of documents a term 't' has appeared in.

  The IDF value of a rare word is high, whereas the IDF of a frequent word is low.
6. **Cosine Similarity**: It is dot product of the two two vectors divided by the product of the two vectors' lengths(or magnitudes). 

  It gives a similarity score of a given anime, compares it with all other animes' similarity scores and hence gives the anime name with the highest similarity score.
7. **difflib**: It gives the closest match of a given value.

In [None]:
#importing all the necessary libraries

import pandas as pd

dataset = pd.read_csv('/content/Anime.csv')

In [None]:
#now we have imported the dataset, let's view it

dataset

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.60,2021.0,,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.0,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.60,2021.0,,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.0,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.0,,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.0,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.0,2010.0,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.0,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.0,,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18490,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,,Web,2.0,Sparkly Key Animation Studio,,"Action, Ancient China, Chinese Animation, Hist...",,2020.0,,Special episodes of Qin Shi Mingyue: Canghai H...,,,Qin Shi Mingyue: Canghai Hengliu,,
18491,18492,Yi Tang Juchang: Sanguo Yanyi,,TV,108.0,,,Chinese Animation,,2010.0,,No synopsis yet - check back soon!,,,,,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,,TV,13.0,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,
18493,18494,Chengshi Jiyi Wo Men de Jieri,,TV,,,,"Chinese Animation, Family Friendly, Short Epis...",,2020.0,,No synopsis yet - check back soon!,,,,,


In [None]:
#let's see the basic information of our dataset

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18495 entries, 0 to 18494
Data columns (total 17 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Rank             18495 non-null  int64  
 1   Name             18495 non-null  object 
 2   Japanese_name    7938 non-null   object 
 3   Type             18495 non-null  object 
 4   Episodes         9501 non-null   float64
 5   Studio           12018 non-null  object 
 6   Release_season   4116 non-null   object 
 7   Tags             18095 non-null  object 
 8   Rating           15364 non-null  float64
 9   Release_year     18112 non-null  float64
 10  End_year         2854 non-null   float64
 11  Description      18491 non-null  object 
 13  Related_Mange    7627 non-null   object 
 14  Related_anime    10063 non-null  object 
 15  Voice_actors     15309 non-null  object 
 16  staff            13005 non-null  object 
dtypes: float64(4), int64(1), object(12)
memory usage: 2.4+ MB


In [None]:
#let's specifically check the null values

dataset.isnull().sum()

Rank                   0
Name                   0
Japanese_name      10557
Type                   0
Episodes            8994
Studio              6477
Release_season     14379
Tags                 400
Rating              3131
Release_year         383
End_year           15641
Description            4
Related_Mange      10868
Related_anime       8432
Voice_actors        3186
staff               5490
dtype: int64

In [None]:
#let's view the stastical information of our dataset

dataset.describe()

Unnamed: 0,Rank,Episodes,Rating,Release_year,End_year
count,18495.0,9501.0,15364.0,18112.0,2854.0
mean,9248.0,20.92085,3.355133,2006.520318,2004.256132
std,5339.19095,37.990858,0.400624,15.189537,13.257484
min,1.0,1.0,0.96,1907.0,1962.0
25%,4624.5,2.0,3.13,2001.0,1996.0
50%,9248.0,12.0,3.36,2012.0,2007.0
75%,13871.5,26.0,3.59,2017.0,2015.0
max,18495.0,800.0,4.6,2023.0,2022.0


In [None]:
min_rating = 0.960000

#Let's view the name of the anime having less rating

min_rated_anime = dataset[dataset.Rating == min_rating]['Name']
print(min_rated_anime)

15368    Tenkuu Danzai Skelter Heaven
Name: Name, dtype: object


In [None]:
#let's view the no.of unique values in Type column

dataset['Type'].nunique()

8

In [None]:
#let's view the unique value counts in the 'Type' column

dataset['Type'].value_counts()

TV       5446
Movie    3577
Web      2488
OVA      2235
Music    2165
Other     990
DVD S     911
TV Sp     683
Name: Type, dtype: int64

In [None]:
#let's view the no.of unique values in Studio column

dataset['Studio'].nunique()

745

In [None]:
#let's view the unique value counts in the 'Studio' column

dataset['Studio'].value_counts()

Toei Animation       737
Sunrise              476
J.C.Staff            382
TMS Entertainment    364
MADHOUSE             357
                    ... 
Studio Shelter         1
Fall 2005              1
Studio Bogey           1
Studio Yamato          1
Spring 1997            1
Name: Studio, Length: 745, dtype: int64

In [None]:
#Let's Select the relevant features for recommendation
selected_features = ['Tags','Type', 'Content_Warning', 'Voice_actors', 'Related_Mange', 'Related_anime']


In [None]:
#let's Display the selected columns only

dataset[selected_features].head(10)

Unnamed: 0,Tags,Type,Content_Warning,Voice_actors,Related_Mange,Related_anime
0,"Action, Adventure, Fantasy, Shounen, Demons, H...",TV,Explicit Violence,"Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...",Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ..."
1,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",TV,"Emotional Abuse,, Mature Themes,, Physical Abu...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se..."
2,"Fantasy, Ancient China, Chinese Animation, Cul...",Web,,"Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...",Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q"
3,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",TV,"Animal Abuse,, Mature Themes,, Violence,, Dome...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful..."
4,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",TV,"Cannibalism,, Explicit Violence","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A..."
5,"Action, Horror, Shounen, Curse, Exorcists, Mon...",TV,Explicit Violence,"Megumi Fushiguro : Yuuma Uchida, Nobara Kugisa...","Jujutsu Kaisen 0, Jujutsu Kaisen","Jujutsu Kaisen (2018), Juju Sanpo, Eve: Kaikai..."
6,"Action, Drama, Fantasy, Horror, Shounen, Dark ...",TV,,"Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...",Attack on Titan,"Attack on Titan, Attack on Titan 2nd Season, A..."
7,"Action, Drama, Fantasy, Horror, Shounen, Dark ...",TV,"Explicit Violence,, Mature Themes,, Physical A...","Eren Jaeger : Yuuki Kaji, Armin Arlelt : Marin...",Attack on Titan,"Attack on Titan, Attack on Titan 2nd Season, A..."
8,"Action, Drama, Fantasy, Shounen, Demons, Histo...",Movie,"Mature Themes,, Suicide,, Violence","Inosuke Hashibira : Yoshitsugu Matsuoka, Kyouj...","Gotouge Koyoharu Tanpenshuu, Demon Slayer: Kim...","Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ..."
9,"Shounen, Sports, Animeism, School Club, School...",TV,,"Shoyo Hinata : Ayumu Murase, Tobio Kageyama : ...","Haikyuu!! (Pilot), Haikyuu!!, Haikyuu-bu!!","Haikyuu!!, Haikyuu!! Lev Kenzan!, Haikyuu!! Se..."


In [None]:
#let's check Null Values in selected columns 
dataset[selected_features].isna().sum()

Tags                 400
Type                   0
Voice_actors        3186
Related_Mange      10868
Related_anime       8432
dtype: int64

In [None]:
#Replacing the null values with null string

for feature in selected_features:
  dataset[feature] = dataset[feature].fillna('')

#now let's view the dataset again after filling the missing values
dataset[selected_features].head(10)

Unnamed: 0,Tags,Type,Content_Warning,Voice_actors,Related_Mange,Related_anime
0,"Action, Adventure, Fantasy, Shounen, Demons, H...",TV,Explicit Violence,"Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...",Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ..."
1,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",TV,"Emotional Abuse,, Mature Themes,, Physical Abu...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se..."
2,"Fantasy, Ancient China, Chinese Animation, Cul...",Web,,"Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...",Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q"
3,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",TV,"Animal Abuse,, Mature Themes,, Violence,, Dome...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful..."
4,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",TV,"Cannibalism,, Explicit Violence","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A..."
5,"Action, Horror, Shounen, Curse, Exorcists, Mon...",TV,Explicit Violence,"Megumi Fushiguro : Yuuma Uchida, Nobara Kugisa...","Jujutsu Kaisen 0, Jujutsu Kaisen","Jujutsu Kaisen (2018), Juju Sanpo, Eve: Kaikai..."
6,"Action, Drama, Fantasy, Horror, Shounen, Dark ...",TV,,"Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...",Attack on Titan,"Attack on Titan, Attack on Titan 2nd Season, A..."
7,"Action, Drama, Fantasy, Horror, Shounen, Dark ...",TV,"Explicit Violence,, Mature Themes,, Physical A...","Eren Jaeger : Yuuki Kaji, Armin Arlelt : Marin...",Attack on Titan,"Attack on Titan, Attack on Titan 2nd Season, A..."
8,"Action, Drama, Fantasy, Shounen, Demons, Histo...",Movie,"Mature Themes,, Suicide,, Violence","Inosuke Hashibira : Yoshitsugu Matsuoka, Kyouj...","Gotouge Koyoharu Tanpenshuu, Demon Slayer: Kim...","Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ..."
9,"Shounen, Sports, Animeism, School Club, School...",TV,,"Shoyo Hinata : Ayumu Murase, Tobio Kageyama : ...","Haikyuu!! (Pilot), Haikyuu!!, Haikyuu-bu!!","Haikyuu!!, Haikyuu!! Lev Kenzan!, Haikyuu!! Se..."


In [None]:
#let's view the no.of rows and columns : shape

dataset[selected_features].shape

(18495, 6)

In [None]:
#Combining all the 5 selected features

combined_features = dataset['Tags']+' '+dataset['Type']+' '+dataset['Content_Warning']+' '+dataset['Voice_actors']+' '+dataset['Related_Mange']+' '+dataset['Related_anime']
combined_features


0        Action, Adventure, Fantasy, Shounen, Demons, H...
1        Drama, Fantasy, Romance, Shoujo, Animal Transf...
2        Fantasy, Ancient China, Chinese Animation, Cul...
3        Action, Adventure, Drama, Fantasy, Mystery, Sh...
4        Action, Fantasy, Horror, Shounen, Dark Fantasy...
                               ...                        
18490    Action, Ancient China, Chinese Animation, Hist...
18491                          Chinese Animation TV       
18492    Chinese Animation, Family Friendly, Short Epis...
18493    Chinese Animation, Family Friendly, Short Epis...
18494    Comedy, Slice of Life, Dogs Movie    Heisei In...
Length: 18495, dtype: object

In [None]:
#let's now view the combined_features' shape

combined_features.shape

(18495,)

In [None]:
#now let's convert the text data to feature vectors, to find the cosine similarity 
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
feature_vectors = vectorizer.fit_transform(combined_features)

print(feature_vectors)

  (0, 37112)	0.17553082272799386
  (0, 24908)	0.198423243547306
  (0, 24858)	0.028371916747573223
  (0, 39394)	0.34244816187356303
  (0, 26402)	0.12111099306578915
  (0, 18803)	0.4379211035696903
  (0, 33715)	0.3903732608587652
  (0, 7671)	0.2796679903339309
  (0, 28417)	0.05329102574024195
  (0, 33898)	0.04887670337277681
  (0, 726)	0.099211621773653
  (0, 7768)	0.023605824427635997
  (0, 5536)	0.02366683625825516
  (0, 22919)	0.07887022120112676
  (0, 7968)	0.018465775286738234
  (0, 34025)	0.10107981959088336
  (0, 13379)	0.08262908078390495
  (0, 6808)	0.029020217043330004
  (0, 27622)	0.024571378546623446
  (0, 12161)	0.11439422781776977
  (0, 20010)	0.11439422781776977
  (0, 37657)	0.11439422781776977
  (0, 1421)	0.08506106847065786
  (0, 15907)	0.060063390018704935
  (0, 1002)	0.09547044114614504
  :	:
  (18491, 37607)	0.4535260910818358
  (18492, 9291)	0.4069843006486021
  (18492, 33240)	0.40184033311831224
  (18492, 10400)	0.39293380153157054
  (18492, 9753)	0.3860411653946236

In [None]:
#let's view the shape of this feature_vectors

feature_vectors.shape

(18495, 40946)

In [None]:
#now let's get the similarity scores using cosine similarity

from sklearn.metrics.pairwise import cosine_similarity

similarity = cosine_similarity(feature_vectors)
print(similarity)

[[1.         0.02277279 0.00635855 ... 0.00695984 0.00695984 0.00388344]
 [0.02277279 1.         0.02227394 ... 0.01349398 0.01349398 0.        ]
 [0.00635855 0.02227394 1.         ... 0.03974567 0.03974567 0.01113746]
 ...
 [0.00695984 0.01349398 0.03974567 ... 1.         1.         0.        ]
 [0.00695984 0.01349398 0.03974567 ... 1.         1.         0.        ]
 [0.00388344 0.         0.01113746 ... 0.         0.         1.        ]]


In [None]:
#let's view the shape of similarity values

similarity.shape

(18495, 18495)

In [None]:
#let's Create a list with all the anime names given in the dataset

list_of_all_titles = dataset['Name'].tolist()
print(list_of_all_titles)




In [None]:
#let's view the length of the list

len(list_of_all_titles)


18495

In [None]:
#Enter a anime name to find animes which are similar to that name entered
anime_name = input(' Enter your favourite anime name : ')


 Enter your favourite anime name : Goblin Slayer


In [None]:
#Let's find the close match for the anime name given by the user

import difflib
find_close_match = difflib.get_close_matches(anime_name, list_of_all_titles)
print(find_close_match)


['Goblin Slayer', 'Goblin Slayer 2', 'Goblin Slayer PV']


In [None]:
#Display the close match 
close_match = find_close_match[0]
print(close_match)


Goblin Slayer


In [None]:
#Finding the index of the nime with the title
index_of_the_anime = dataset[dataset.Name == close_match]['Rank'].values[0]
print(index_of_the_anime)


776


In [None]:
#here 775 is the serial no. starting from 0(index number)

dataset.Rank[775]

776

In [None]:
#now let's Get the similarity row for the selected index 
#These will be the similarity values for the movie entered by the user 

similarity_score = list(enumerate(similarity[index_of_the_anime]))
print(similarity_score)


[(0, 0.007966087079151927), (1, 0.020322447159423388), (2, 0.008533558841007679), (3, 0.017944952399110998), (4, 0.006537517330891695), (5, 0.044142414287267465), (6, 0.015497344765514379), (7, 0.014244703254741992), (8, 0.013277180033321869), (9, 0.013442570308015322), (10, 0.025156505869397917), (11, 0.018364740042839073), (12, 0.01205038419076037), (13, 0.014549459750965525), (14, 0.031127210735230105), (15, 0.012409938557440775), (16, 0.023657478860669698), (17, 0.030740837637859515), (18, 0.003956658512863833), (19, 0.03982183028980896), (20, 0.015449267652431303), (21, 0.011035293578790777), (22, 0.0098775657752283), (23, 0.01041496656938727), (24, 0.04432858027864866), (25, 0.021919295383247016), (26, 0.018429455838495402), (27, 0.026277922976548594), (28, 0.03801076899513908), (29, 0.011385504841952403), (30, 0.033429245064934124), (31, 0.024990869702956574), (32, 0.004831161350968242), (33, 0.030653002197757725), (34, 0.03610193859208602), (35, 0.012416746773904374), (36, 0.00

In [None]:
#Length of Similarity Score 

len(similarity_score)

18495

In [None]:
#now let's sort the animes based on their similarity score

sorted_similar_animes = sorted(similarity_score, key = lambda x:x[1], reverse = True)
print(sorted_similar_animes)

[(776, 1.0), (358, 0.5325471320251958), (16424, 0.24833029870412643), (14732, 0.23905936860654065), (12728, 0.21876806125366685), (6356, 0.2063076029954719), (2599, 0.20285404500722576), (4537, 0.1873888698663106), (1652, 0.18672956593614337), (14670, 0.18322775556010032), (18446, 0.170354674983398), (15253, 0.16351080304915588), (15140, 0.1614192986353963), (11509, 0.15358014865391578), (7056, 0.1530663300048976), (4597, 0.14919199143210043), (2311, 0.14381758045195767), (13739, 0.14349676147415624), (15340, 0.1425891869674832), (9349, 0.14227064114390905), (2005, 0.1392675655865469), (13612, 0.1381863936532685), (10406, 0.13783979619335748), (14127, 0.13722884994879525), (15043, 0.1366969006200837), (14539, 0.13630067071873284), (16663, 0.1356943855185195), (1302, 0.13498289591126764), (12550, 0.13490547764684693), (3142, 0.13440988299698373), (15287, 0.13241379667119688), (15363, 0.13008487159326698), (15265, 0.12943520850867393), (3267, 0.12792563208927377), (8321, 0.12702201575658

In [None]:
#let's replace the index of the dataframe with the rank

#dataset = dataset.set_index([pd.Index('Rank')])

In [None]:
#After the sorting, let's now recommend(print) the names of the similar animes based on their index

print('Animes suggested for you are: \n')

i=1
for movie in sorted_similar_animes:
  index = movie[0]    #taking only the index value but not the score
  title_from_index = dataset[dataset.Rank==index]['Name'].values[0]
  if(i<11):
    print(i,'.', title_from_index)
    i+=1



Animes suggested for you are: 

1 . Goblin Slayer
2 . Detective Conan Movie 15: Quarter of Silence
3 . Shenmue the Animation
4 . Hyadain: Kara-age-kun Ondo 2012
5 . Kamigami no Ki
6 . Tonagura!
7 . Sound! Euphonium: Ready, Set, Monaka
8 . Tensei Shoya kara Musabori H: Ouji no Honmei wa Akuyaku Reijou
9 . Jewelpet Twinkle Special
10 . Navia Dratp


IndexError: ignored