<a href="https://colab.research.google.com/github/NurFortuna/Content-based-_anime_recommendation_system/blob/main/content_based_anime_recommendation_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ANIME CONTENT BASED RECOMMENDATION SYSTEM

In [3]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity


df = pd.read_csv("anime_with_synopsis.csv")

In [4]:
df.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ..."
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0..."
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...


In [5]:
#returns the number of missing values in the dataset
df.isnull().sum()

MAL_ID       0
Name         0
Score        0
Genres       0
sypnopsis    8
dtype: int64

In [6]:
#dropna will drop all missing values from your original dataset
# dropna()-->>tüm NaN değerleri siler
#Yapılan değişiklikleri kalıcı hale getirmek için inplace=True
#parametresini verdik.
df.dropna(inplace=True)

In [7]:
#method returns a Series with True and False values 
#that describe which rows in the DataFrame are "duplicated" and no
df.duplicated().sum()

0

In [8]:
df["Score"] = df["Score"].map(lambda x:np.nan if x=="Unknown" else x)

In [9]:
df["Score"].fillna(df["Score"].median(),inplace = True)

In [10]:
df["Score"] = df["Score"].astype(float)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16206 entries, 0 to 16213
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   MAL_ID     16206 non-null  int64  
 1   Name       16206 non-null  object 
 2   Score      16206 non-null  float64
 3   Genres     16206 non-null  object 
 4   sypnopsis  16206 non-null  object 
dtypes: float64(1), int64(1), object(3)
memory usage: 759.7+ KB


In [12]:
# Top 10 Anime Based on Score
df.sort_values(by='Score', ascending=False).head(10)

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
3446,5114,Fullmetal Alchemist: Brotherhood,9.19,"Action, Military, Adventure, Comedy, Drama, Ma...","""In order for something to be obtained, someth..."
14647,40028,Shingeki no Kyojin: The Final Season,9.17,"Action, Military, Mystery, Super Power, Drama,...",Gabi Braun and Falco Grice have been training ...
4953,9253,Steins;Gate,9.11,"Thriller, Sci-Fi",The self-proclaimed mad scientist Rintarou Oka...
5660,11061,Hunter x Hunter (2011),9.1,"Action, Adventure, Fantasy, Shounen, Super Power",Hunter x Hunter is set in a world where Hunter...
8879,28977,Gintama°,9.1,"Action, Comedy, Historical, Parody, Samurai, S...","Gintoki, Shinpachi, and Kagura return as the f..."
13720,38524,Shingeki no Kyojin Season 3 Part 2,9.1,"Action, Drama, Fantasy, Military, Mystery, Sho...",Seeking to restore humanity's diminishing hope...
5234,9969,Gintama',9.08,"Action, Sci-Fi, Comedy, Historical, Parody, Sa...","fter a one-year hiatus, Shinpachi Shimura retu..."
723,820,Ginga Eiyuu Densetsu,9.07,"Military, Sci-Fi, Space, Drama",The 150-year-long stalemate between the two in...
6377,15417,Gintama': Enchousen,9.04,"Action, Comedy, Historical, Parody, Samurai, S...","hile Gintoki Sakata was away, the Yorozuya fou..."
8854,28851,Koe no Katachi,9.0,"Drama, School, Shounen","s a wild youth, elementary school student Shou..."


In [13]:
#convert the Genres and sypnopsis which is a string to a list
df['Genres'] = df['Genres'].apply(lambda x:x.split())
df['sypnopsis'] = df['sypnopsis'].apply(lambda x:x.split())

In [14]:
df.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
0,1,Cowboy Bebop,8.78,"[Action,, Adventure,, Comedy,, Drama,, Sci-Fi,...","[In, the, year, 2071,, humanity, has, colonize..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"[Action,, Drama,, Mystery,, Sci-Fi,, Space]","[other, day,, another, bounty—such, is, the, l..."
2,6,Trigun,8.24,"[Action,, Sci-Fi,, Adventure,, Comedy,, Drama,...","[Vash, the, Stampede, is, the, man, with, a, $..."
3,7,Witch Hunter Robin,7.27,"[Action,, Mystery,, Police,, Supernatural,, Dr...","[ches, are, individuals, with, special, powers..."
4,8,Bouken Ou Beet,6.98,"[Adventure,, Fantasy,, Shounen,, Supernatural]","[It, is, the, dark, century, and, the, people,..."


In [15]:
# remove space between two words
df['Genres'] = df['Genres'].apply(lambda x:[i.replace(" ","") for i in x])
df['sypnopsis'] = df['sypnopsis'].apply(lambda x:[i.replace(" ","") for i in x])

In [16]:
df['features'] = df['Genres'] + df['sypnopsis'] 

In [17]:
new_df = df[['Name', 'features']]

In [18]:
# convert list to string
new_df['features'] = new_df['features'].apply(lambda x:" ".join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['features'] = new_df['features'].apply(lambda x:" ".join(x))


In [19]:
new_df

Unnamed: 0,Name,features
0,Cowboy Bebop,"Action, Adventure, Comedy, Drama, Sci-Fi, Spac..."
1,Cowboy Bebop: Tengoku no Tobira,"Action, Drama, Mystery, Sci-Fi, Space other da..."
2,Trigun,"Action, Sci-Fi, Adventure, Comedy, Drama, Shou..."
3,Witch Hunter Robin,"Action, Mystery, Police, Supernatural, Drama, ..."
4,Bouken Ou Beet,"Adventure, Fantasy, Shounen, Supernatural It i..."
...,...,...
16209,Daomu Biji Zhi Qinling Shen Shu,"Adventure, Mystery, Supernatural No synopsis i..."
16210,Mieruko-chan,"Comedy, Horror, Supernatural ko is a typical h..."
16211,Higurashi no Naku Koro ni Sotsu,"Mystery, Dementia, Horror, Psychological, Supe..."
16212,Yama no Susume: Next Summit,"Adventure, Slice of Life, Comedy New Yama no S..."


In [20]:
#Stemming is the process of producing morphological variants of a root/base word.
"""
root word "like" include:

-> "likes"
-> "liked"
-> "likely"
-> "liking"

"""

import nltk
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()

In [21]:
def stem(text):
    y = []
    
    for i in text.split():
        y.append(ps.stem(i))
    
    return " ".join(y)

In [22]:
new_df['features'] = new_df['features'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['features'] = new_df['features'].apply(stem)


In [23]:
# convert to lowercase
new_df['features'] = new_df['features'].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['features'] = new_df['features'].apply(lambda x:x.lower())


In [24]:
new_df.head()

Unnamed: 0,Name,features
0,Cowboy Bebop,"action, adventure, comedy, drama, sci-fi, spac..."
1,Cowboy Bebop: Tengoku no Tobira,"action, drama, mystery, sci-fi, space other da..."
2,Trigun,"action, sci-fi, adventure, comedy, drama, shou..."
3,Witch Hunter Robin,"action, mystery, police, supernatural, drama, ..."
4,Bouken Ou Beet,"adventure, fantasy, shounen, supernatur it is ..."


In [25]:
"""
Countvectorizer is a method to convert text to numerical data

The CountVectorizer will select the words/features/terms which occur the most frequently.
It takes absolute values so if you set the ‘max_features = 3’, it will select the 3 most 
common words in the data.

If ‘english’, a built-in stop word list for English is used. 
"""

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000, stop_words='english')

In [26]:
vectors = cv.fit_transform(new_df['features']).toarray()

In [27]:
vectors

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 3, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [28]:
from sklearn.metrics.pairwise import cosine_similarity

In [37]:
#we use cosine similarity between these vectors to find their similarity.
similarity = cosine_similarity(vectors)

In [38]:
cosine_similarity(vectors).shape

(16206, 16206)

In [32]:
# creates a tupple and stores every similarity index
sorted(list(enumerate(similarity[0])),reverse=True,key=lambda x:x[1])[1:10]

[(3149, 0.34932619631159634),
 (5545, 0.33517751522573636),
 (1145, 0.31619510292053465),
 (5949, 0.2979355690895434),
 (15573, 0.2979355690895434),
 (365, 0.2931856917889426),
 (3669, 0.2922959000805237),
 (4028, 0.2878331051844618),
 (2077, 0.2844400619942872)]

In [33]:
def recommend(anime):
    movie_index = new_df[new_df['Name'] == anime].index[0]
    distances = similarity[movie_index]
    movies_list = sorted(list(enumerate(distances)),reverse=True,key=lambda x:x[1])[1:15]
    for i in movies_list:
        print(new_df.iloc[i[0]].Name)

In [34]:
recommend('Shingeki no Kyojin')

Shingeki no Kyojin Season 2
Shingeki no Kyojin Season 3
Mushrambo
Shingeki no Kyojin Season 2 Movie: Kakusei no Houkou
Shingeki! Kyojin Chuugakkou
Noblesse: Pamyeol-ui Sijak
Kekkai Sensen & Beyond
Karas
Shingeki no Kyotou
Ajin
Kamisama Kazoku
Kuusou Kagaku Sekai Gulliver Boy
Wo Jiao Bai Xiaofei
Gyakkyou Burai Kaiji: Ultimate Survivor


In [35]:
recommend('Boku no Hero Academia')

Boku no Hero Academia 4th Season
Tiger & Bunny
Ore wa Teppei
Yume Senshi Wingman
Boku no Hero Academia 2nd Season
Samurai Flamenco
Angel Densetsu
Yuusha ni Narenakatta Ore wa Shibushibu Shuushoku wo Ketsui Shimashita. OVA
Nisekoi
Maoyuu Maou Yuusha
One Punch Man 2nd Season
Pandora Voxx Complete
Double Decker! Doug & Kirill
The Samurai


In [36]:
recommend('Death Note')

Death Note: Rewrite
Ghost Messenger
Soul Eater
Kite Liberator
Shinigami no Ballad.
Bleach: Memories in the Rain
Yami no Shihosha Judge
Isekai wa Smartphone to Tomo ni.
Wan Jie Shen Zhu
Platinum End
Persona 3 the Movie 4: Winter of Rebirth
Yume Senshi Wingman
Da Yu Hai Tang (Movie)
Koutetsujou no Kabaneri
