## Importing Required Modules

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors
import tensorflow as tf

print(tf.__version__)

2.3.1


## Fetching the Dataset as a Pandas DataFrame

In [2]:
anime_df = pd.read_csv('anime.csv')
anime_df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [3]:
anime_df['type'].unique()

array(['Movie', 'TV', 'OVA', 'Special', 'Music', 'ONA', nan], dtype=object)

In [4]:
anime_df_new = pd.read_csv('anime_2.csv')
anime_df_new.head()

Unnamed: 0,anime_id,title,title_english,title_japanese,title_synonyms,image_url,type,source,episodes,status,...,background,premiered,broadcast,related,producer,licensor,studio,genre,opening_theme,ending_theme
0,11013,Inu x Boku SS,Inu X Boku Secret Service,妖狐×僕SS,Youko x Boku SS,https://myanimelist.cdn-dena.com/images/anime/...,TV,Manga,12,Finished Airing,...,Inu x Boku SS was licensed by Sentai Filmworks...,Winter 2012,Fridays at Unknown,"{'Adaptation': [{'mal_id': 17207, 'type': 'man...","Aniplex, Square Enix, Mainichi Broadcasting Sy...",Sentai Filmworks,David Production,"Comedy, Supernatural, Romance, Shounen","['""Nirvana"" by MUCC']","['#1: ""Nirvana"" by MUCC (eps 1, 11-12)', '#2: ..."
1,2104,Seto no Hanayome,My Bride is a Mermaid,瀬戸の花嫁,The Inland Sea Bride,https://myanimelist.cdn-dena.com/images/anime/...,TV,Manga,26,Finished Airing,...,,Spring 2007,Unknown,"{'Adaptation': [{'mal_id': 759, 'type': 'manga...","TV Tokyo, AIC, Square Enix, Sotsu",Funimation,Gonzo,"Comedy, Parody, Romance, School, Shounen","['""Romantic summer"" by SUN&LUNAR']","['#1: ""Ashita e no Hikari (明日への光)"" by Asuka Hi..."
2,5262,Shugo Chara!! Doki,Shugo Chara!! Doki,しゅごキャラ！！どきっ,"Shugo Chara Ninenme, Shugo Chara! Second Year",https://myanimelist.cdn-dena.com/images/anime/...,TV,Manga,51,Finished Airing,...,,Fall 2008,Unknown,"{'Adaptation': [{'mal_id': 101, 'type': 'manga...","TV Tokyo, Sotsu",,Satelight,"Comedy, Magic, School, Shoujo","['#1: ""Minna no Tamago (みんなのたまご)"" by Shugo Cha...","['#1: ""Rottara Rottara (ロッタラ ロッタラ)"" by Buono! ..."
3,721,Princess Tutu,Princess Tutu,プリンセスチュチュ,,https://myanimelist.cdn-dena.com/images/anime/...,TV,Original,38,Finished Airing,...,Princess Tutu aired in two parts. The first pa...,Summer 2002,Fridays at Unknown,"{'Adaptation': [{'mal_id': 1581, 'type': 'mang...","Memory-Tech, GANSIS, Marvelous AQL",ADV Films,Hal Film Maker,"Comedy, Drama, Magic, Romance, Fantasy","['""Morning Grace"" by Ritsuko Okazaki']","['""Watashi No Ai Wa Chiisaikeredo"" by Ritsuko ..."
4,12365,Bakuman. 3rd Season,Bakuman.,バクマン。,Bakuman Season 3,https://myanimelist.cdn-dena.com/images/anime/...,TV,Manga,25,Finished Airing,...,,Fall 2012,Unknown,"{'Adaptation': [{'mal_id': 9711, 'type': 'mang...","NHK, Shueisha",,J.C.Staff,"Comedy, Drama, Romance, Shounen","['#1: ""Moshimo no Hanashi (もしもの話)"" by nano.RIP...","['#1: ""Pride on Everyday"" by Sphere (eps 1-13)..."


### A second dataset to merge with the first one for a wider range of data and a more accurate recommendation system

In [5]:
anime_df_new = anime_df_new[['anime_id' , 'title' , 'type' , 'episodes', 'genre' , 'score' , 'members']]
anime_df_new.rename(columns = {'title' : 'name' , 'score' : 'rating'} , inplace = True)
anime_df_new.head()

Unnamed: 0,anime_id,name,type,episodes,genre,rating,members
0,11013,Inu x Boku SS,TV,12,"Comedy, Supernatural, Romance, Shounen",7.63,283882
1,2104,Seto no Hanayome,TV,26,"Comedy, Parody, Romance, School, Shounen",7.89,204003
2,5262,Shugo Chara!! Doki,TV,51,"Comedy, Magic, School, Shoujo",7.55,70127
3,721,Princess Tutu,TV,38,"Comedy, Drama, Magic, Romance, Fantasy",8.21,93312
4,12365,Bakuman. 3rd Season,TV,25,"Comedy, Drama, Romance, Shounen",8.67,182765


In [6]:
anime_df_new = anime_df_new[anime_df_new['type'] != 'Unknown']
anime_df_new['type'].unique()

array(['TV', 'Movie', 'Music', 'OVA', 'ONA', 'Special'], dtype=object)

In [7]:

print(len(anime_df_new.index))
print(len(anime_df.index))
for anime_id in anime_df_new['anime_id']:
    if len(anime_df[anime_df['anime_id'] == anime_id].index) == 0:
        anime_df_temp = anime_df_new[anime_df_new['anime_id'] == anime_id]
        anime_df = anime_df.append(anime_df_temp , ignore_index=True , verify_integrity=True)      

14448
12294


In [8]:
print(len(anime_df.index))

14472


Analysing all the types in dataset , replacing the nan type is required

In [9]:
types = anime_df['type'].unique().tolist()
print("Number of Types : {} \nTypes: \n".format(len(types)) , types)

Number of Types : 7 
Types: 
 ['Movie', 'TV', 'OVA', 'Special', 'Music', 'ONA', nan]


## Tuning the Dataset
First , We will fill all the empty/'Unknown data' data cells with data individually.
Like for the type movie , the episodes should be 1 so I will fill all the empty episode cells of type movie with 1
Then , for Hentai also it has mostly 1 ep so I will do the same thing for hentai as well.

Getting the number of None type datacells in each column

In [10]:
anime_df.isnull().sum()

anime_id      0
name          0
genre        84
type         25
episodes      0
rating      230
members       0
dtype: int64

In [11]:
anime_df.loc[(anime_df['genre'] == 'Hentai') & (anime_df['episodes'] == 'Unknown')] = 1

In [12]:
anime_df.loc[(anime_df['genre'] == 'Movie') & (anime_df['episodes'] == 'Unknown')] = 1

As you can see below songs also have only one episode so we will replace the empty datacells in episodes column of Music type by 1

In [13]:
anime_df[anime_df['type'] == 'Music'].head(3)

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
169,34240,Shelter,"Music, Sci-Fi",Music,1,8.38,71136
336,731,Interstella5555: The 5tory of The 5ecret 5tar ...,"Adventure, Drama, Music, Sci-Fi",Music,1,8.17,31464
533,17949,The Everlasting Guilty Crown,Music,Music,1,8.0,11663


In [14]:
anime_df.loc[(anime_df['type'] == 'Music') & (anime_df['episodes'] == 'Unknown')] = 1

#### Replacing the remaining 'Unknown' data cells in episodes by nan type

In [15]:
anime_df['episodes'] = anime_df['episodes'].map(lambda x: np.nan if x == 'Unknown' else x)

Filling all the nan types by the episodes column's median

In [16]:
anime_df['episodes'].fillna(anime_df['episodes'].median() , inplace = True)

In [17]:
anime_df['episodes'].isnull().sum()

0

In [18]:
anime_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14472 entries, 0 to 14471
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  14472 non-null  int64  
 1   name      14472 non-null  object 
 2   genre     14388 non-null  object 
 3   type      14447 non-null  object 
 4   episodes  14472 non-null  object 
 5   rating    14247 non-null  float64
 6   members   14472 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 791.6+ KB


Changing data types of columns

In [19]:
anime_df['name'] == anime_df['name'].replace(['Itadaki! Seieki♥'] , 'Itadaki! Seieki')

0        True
1        True
2        True
3        True
4        True
         ... 
14467    True
14468    True
14469    True
14470    True
14471    True
Name: name, Length: 14472, dtype: bool

In [20]:
anime_df['members'] = anime_df['members'].astype(float)

In [21]:
anime_df['rating'].fillna(anime_df['rating'].median() , inplace = True)

In [22]:
anime_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14472 entries, 0 to 14471
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  14472 non-null  int64  
 1   name      14472 non-null  object 
 2   genre     14388 non-null  object 
 3   type      14447 non-null  object 
 4   episodes  14472 non-null  object 
 5   rating    14472 non-null  float64
 6   members   14472 non-null  float64
dtypes: float64(2), int64(1), object(4)
memory usage: 791.6+ KB


In [23]:
anime_df.replace(to_replace ="Itadaki! Seieki♥", 
                 value ='Itadaki! Seieki' , inplace = True) 

## Creating another dataset with Relevant Features and one hot encoding using get_dummies on genre and type column

In [24]:
anime_data = pd.concat([anime_df['genre'].str.get_dummies(sep = ','),
                         pd.get_dummies(anime_df['type']),
                         anime_df['episodes'],
                         anime_df['rating'],
                         anime_df['members'],
                         ] , axis = 1)

## How the Dataset looks after one hot encoding 

In [25]:
anime_data.head()

Unnamed: 0,Adventure,Cars,Comedy,Dementia,Demons,Drama,Ecchi,Fantasy,Game,Harem,...,1,Movie,Music,ONA,OVA,Special,TV,episodes,rating,members
0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,1,9.37,200630.0
1,1,0,0,0,0,1,0,1,0,0,...,0,0,0,0,0,0,1,64,9.26,793665.0
2,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,51,9.25,114262.0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,24,9.17,673572.0
4,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,51,9.16,151266.0


In [26]:
scaler = MinMaxScaler(feature_range=(0,1))

anime_data_scaled = scaler.fit_transform(anime_data)
np.round(anime_data_scaled , decimals = 2)
anime_data_scaled

array([[0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        5.50055006e-04, 9.37000000e-01, 1.97876158e-01],
       [1.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        3.52035204e-02, 9.26000000e-01, 7.82771174e-01],
       [0.00000000e+00, 0.00000000e+00, 1.00000000e+00, ...,
        2.80528053e-02, 9.25000000e-01, 1.12693643e-01],
       ...,
       [1.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        5.50055006e-04, 0.00000000e+00, 1.71710308e-03],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        5.50055006e-04, 6.00000000e-01, 3.15607688e-05],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,
        5.50055006e-04, 5.15000000e-01, 4.69466436e-04]])

In [27]:
anime_data_scaled.shape

(14472, 93)

## Using Nearest Neighbours unsupervised learning with ball_tree algorithm and 6 nearest neighbors

Here the nearest neighbours will represent the animes that are similar to an anime

### Fitting Data

In [28]:
nn_bt = NearestNeighbors(n_neighbors=6 , algorithm='ball_tree').fit(anime_data_scaled)

### Taking Distances and Indices of the 5 closest Animes and itself (5+1 = 6) from it

In [29]:
distances , indices = nn_bt.kneighbors(anime_data_scaled)

In [30]:
print("Distances shape : {} \nIndices Shape: {} \nDistances data overview : {} \nIndices data overview : {}".format(distances.shape , indices.shape , distances[0], indices[0]))

Distances shape : (14472, 6) 
Indices Shape: (14472, 6) 
Distances data overview : [0.         1.01506549 1.03095542 1.0310255  1.41625771 1.43204444] 
Indices data overview : [   0  208 1494 1959   60 2103]


### Creating a function that would return the index of the anime when provided its name

In [31]:
def get_index(name):
  try:
    index = anime_df[anime_df['name'] == name].index.tolist()[0]
  except:
    return "Could not find the Anime"
  return index

get_index('Steins;Gate')

3

### Creating a function that would return the name of the anime when provided its index

In [32]:
def get_name(id):
  try:
    name = anime_df[anime_df.index == id]['name'].tolist()[0]
  except:
    return "Could not find the Anime"
  return name

get_name(2)

'Gintama°'

## Creating a function that prints all the relevant data about the anime

In [33]:
def get_info(id):
  print("Name :" , anime_df[anime_df.index == id]['name'].tolist()[0])
  print("Rating :" , anime_df[anime_df.index == id]['rating'].tolist()[0])
  print("Number of Episodes :" , anime_df[anime_df.index == id]['episodes'].tolist()[0])
  print("Genre :" , anime_df[anime_df.index == id]['genre'].tolist()[0])
  print("Type :" , anime_df[anime_df.index == id]['type'].tolist()[0])
  print("Number of Members : " , anime_df[anime_df.index == id]['members'].tolist()[0])

get_info(3)

Name : Steins;Gate
Rating : 9.17
Number of Episodes : 24
Genre : Sci-Fi, Thriller
Type : TV
Number of Members :  673572.0


## Creating a function that will recommend the user anime based on an anime the user likes

In [34]:
def recommend_me(name = None , id = None):
  if name != None:
    id = get_index(name)
  print("Here are some of the Animes you would like to watch :")

  for index in indices[id][1:]:   #the first index in indices will be the anime itself so we have to print [1:] i.e. the other animes
    print("------------------------------------------------------------------")
    get_info(index)

recommend_me('Shingeki no Kyojin')

Here are some of the Animes you would like to watch :
------------------------------------------------------------------
Name : Shingeki no Kyojin Season 2
Rating : 6.46
Number of Episodes : 1.0
Genre : Action, Drama, Fantasy, Shounen, Super Power
Type : TV
Number of Members :  170054.0
------------------------------------------------------------------
Name : One Piece
Rating : 8.58
Number of Episodes : 1.0
Genre : Action, Adventure, Comedy, Drama, Fantasy, Shounen, Super Power
Type : TV
Number of Members :  504862.0
------------------------------------------------------------------
Name : Shingeki no Kyojin OVA
Rating : 7.88
Number of Episodes : 3
Genre : Action, Drama, Fantasy, Shounen, Super Power
Type : OVA
Number of Members :  121063.0
------------------------------------------------------------------
Name : Utawarerumono: Itsuwari no Kamen
Rating : 7.35
Number of Episodes : 25
Genre : Action, Drama, Fantasy
Type : TV
Number of Members :  55851.0
----------------------------------

## Run this cell if you want Recommendations

In [35]:
anime_name = input("Enter Name of an anime you like : ")

try:
  anime_df[anime_df['name'] == anime_name]
except:
  print("NO SUCH ANIME FOUND")

recommend_me(anime_name)

Enter Name of an anime you like : Hunter x Hunter (2011)
Here are some of the Animes you would like to watch :
------------------------------------------------------------------
Name : Hunter x Hunter
Rating : 8.48
Number of Episodes : 62
Genre : Action, Adventure, Shounen, Super Power
Type : TV
Number of Members :  166255.0
------------------------------------------------------------------
Name : Nano Invaders
Rating : 7.08
Number of Episodes : 52
Genre : Action, Adventure, Shounen, Super Power
Type : TV
Number of Members :  519.0
------------------------------------------------------------------
Name : Boruto: Naruto Next Generations
Rating : 7.03
Number of Episodes : 0
Genre : Action, Adventure, Martial Arts, Shounen, Super Power
Type : TV
Number of Members :  213675.0
------------------------------------------------------------------
Name : Rekka no Honoo
Rating : 7.44
Number of Episodes : 42
Genre : Action, Adventure, Martial Arts, Shounen, Super Power
Type : TV
Number of Members 

## Trying the recommendation system on Dragon Ball Z (shounen , action)

In [36]:
recommend_me('Dragon Ball Z')

Here are some of the Animes you would like to watch :
------------------------------------------------------------------
Name : Dragon Ball Kai
Rating : 7.95
Number of Episodes : 97
Genre : Action, Adventure, Comedy, Fantasy, Martial Arts, Shounen, Super Power
Type : TV
Number of Members :  116832.0
------------------------------------------------------------------
Name : Dragon Ball Super
Rating : 7.4
Number of Episodes : 1.0
Genre : Action, Adventure, Comedy, Fantasy, Martial Arts, Shounen, Super Power
Type : TV
Number of Members :  111443.0
------------------------------------------------------------------
Name : Dragon Ball Kai (2014)
Rating : 8.01
Number of Episodes : 61
Genre : Action, Adventure, Comedy, Fantasy, Martial Arts, Shounen, Super Power
Type : TV
Number of Members :  42666.0
------------------------------------------------------------------
Name : One Piece
Rating : 8.58
Number of Episodes : 1.0
Genre : Action, Adventure, Comedy, Drama, Fantasy, Shounen, Super Power
Ty

## Trying recommendation system on Haikyuu!! (Sports)

In [37]:
recommend_me('Haikyuu!!')

Here are some of the Animes you would like to watch :
------------------------------------------------------------------
Name : Haikyuu!! Second Season
Rating : 8.93
Number of Episodes : 25
Genre : Comedy, Drama, School, Shounen, Sports
Type : TV
Number of Members :  179342.0
------------------------------------------------------------------
Name : Haikyuu!!: Karasuno Koukou VS Shiratorizawa Gakuen Koukou
Rating : 9.15
Number of Episodes : 10
Genre : Comedy, Drama, School, Shounen, Sports
Type : TV
Number of Members :  93351.0
------------------------------------------------------------------
Name : Slam Dunk
Rating : 8.56
Number of Episodes : 101
Genre : Comedy, Drama, School, Shounen, Sports
Type : TV
Number of Members :  82570.0
------------------------------------------------------------------
Name : Kuroko no Basket 2nd Season
Rating : 8.58
Number of Episodes : 25
Genre : Comedy, School, Shounen, Sports
Type : TV
Number of Members :  243325.0
--------------------------------------

## Trying recommendation system on Hentai genre

In [38]:
recommend_me('Itadaki! Seieki')

Here are some of the Animes you would like to watch :
------------------------------------------------------------------
Name : Brandish
Rating : 6.89
Number of Episodes : 2
Genre : Hentai, Supernatural
Type : OVA
Number of Members :  6742.0
------------------------------------------------------------------
Name : Bible Black Gaiden
Rating : 6.89
Number of Episodes : 2
Genre : Hentai, Supernatural
Type : OVA
Number of Members :  14478.0
------------------------------------------------------------------
Name : Aku no Onna Kanbu: Full Moon Night
Rating : 6.85
Number of Episodes : 1
Genre : Hentai, Supernatural
Type : OVA
Number of Members :  3613.0
------------------------------------------------------------------
Name : Megachu!
Rating : 6.61
Number of Episodes : 3
Genre : Hentai, Supernatural
Type : OVA
Number of Members :  3991.0
------------------------------------------------------------------
Name : Hachishaku Hachiwa Keraku Meguri: Igyou Kaikitan The Animation
Rating : 6.59
Number