# Anime Recommendation System
---

# Latar Belakang  
Menggunakan dataset dari https://www.kaggle.com/CooperUnion/anime-recommendations-database?select=anime.csv
Dataset ini mengandung informasi mengenai user preference data dari 73.516 user pada 12.294 anime

# Project Objective  
Membuat anime recommendation system menggunakan metode 'Similiarity Measure' dan 'Content Based Recommendation System'.

---

# Import

## Library

In [40]:
import pandas as pd 
import numpy as np 
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Dataset

In [41]:
df = pd.read_csv('anime.csv')
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


## Dataset Content  

1. anime_id  
Kode unik dari myanimelist.net untuk mengidentifikasi anime  

2. name  
Nama anime

3. genre
Genre anime yang dipisahkan oleh koma  

4. type  
Tipe atau jenis anime (movie, TV, OVA, dll)

5. episodes
Jumlah episode anime  

6. rating  
Rating rata-rata yang diberikan user  

7. members  
Jumlah anggota komunitas anime

---

# Preprocessing

In [42]:
df.isna().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [43]:
df = df.dropna().reset_index(drop=True)

---

# Content Based Recommendation System

Membuat rekomendasi berdasarkan kemiripan profil item, misalnya; judul, genre, tahun rilis ataupun authornya. Pada project ini, akan dibuat rekomendasi berdasarkan kemiripan genre anime.

Ada dua metode content based recommendation system yang akan dipakai, yaitu:
- similiarity measure : mengukur kemiripan antar item dengan metode cosine similarity
- content based filtering one user : memanfaatkan sejumlah profil item

# Cosine Similarity

Recommendation system untuk user berdasarkan kemiripan anime yang disuka

Bagaimana Cosine Similarity bekerja?

![cosine.png](cosine.png)    

Katakan lah kita punya empat film dan kita hanya ambil dua genre agar dapat divisualisasikan bagaimana cosine similarity bekerja. Kita plotkan comedy pada sumbu x dan action pada sumbu y. Kita anggap posisi (0,0) sebagai titik pusat lalu kita posisikan masing-masing film sesuai dengan koordinatnya dan kita tarik garis lurus dari titik pusat ke masing-masing film. **Sudut yang terbentuk antara dua film mencerminkan seberapa mirip kedua film tersebut**. **Semakin kecil sudut semakin mirip pula filmnya dan jika dempet artinya kedua film sama persis dari segi konten**. Jika garis antara dua film membentuk sudut siku-siku (90 deg.), maka film itu jelas berbeda.  

![image.png](cosine2.png)  

Pada kenyataannya genre yang digunakan sebagai profil tidak hanya dua, ada lebih daripada itu. Sekarang kita gunakan data yang lebih realistis, dengan daftar film yang masih sama tetapi variasi genre yang lebih banyak.  

Kita akan gunakan formula sebagai berikut dimana x adalah nilai dari konten suatu item dan y adalah nilai atribut-atribut item lainnya. **Nilai dari cosine similarity berkisar antara 0 hingga 1**. **Nilai 1 artinya dua item yang dibandingkan memiliki nilai konten yang sama persis dan nilai 0 artinya dua item yang dibandingkan jelas berbeda**.

*data di atas hanya ilustrasi*

In [44]:
# memecah kolom genre
cvr = CountVectorizer(tokenizer=lambda x:x.split(', '))

genre = cvr.fit_transform(df['genre'])

In [45]:
# membuat dataframe berdasarkan pecahan kolom genre
genre_name = pd.DataFrame(genre.toarray(), columns=cvr.get_feature_names())
genre_name.head()

Unnamed: 0,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,game,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0
1,1,1,0,0,0,0,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [50]:
# menggabungkan nama anime dengan pecahan kolom genre
df_genre = pd.concat([df['name'],genre_name], axis=1)
df_genre.head()

Unnamed: 0,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,Kimi no Na wa.,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
1,Fullmetal Alchemist: Brotherhood,1,1,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
2,Gintama°,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Steins;Gate,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,Gintama&#039;,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [58]:
# anime recommender menggunakan anime Naruto
cos_score = cosine_similarity(df_genre[df_genre['name']=='Naruto'].drop('name', axis=1), df_genre.drop('name', axis=1))
df_cos_score = pd.DataFrame(cos_score, index=['CosScore']).T

# menggabungkan cosine score menggunakan anime naruto dengan df awal
df_anime_cos = pd.concat([df['name'], df_cos_score], axis=1)

In [61]:
# melihat 5 anime dengan cosine score tertinggi
df_anime_cos[df_anime_cos['name']!='Naruto'].sort_values('CosScore', ascending=False).head()

Unnamed: 0,name,CosScore
1103,Boruto: Naruto the Movie - Naruto ga Hokage ni...,1.0
1343,Naruto x UT,1.0
486,Boruto: Naruto the Movie,1.0
615,Naruto: Shippuuden,1.0
2996,Naruto Soyokazeden Movie: Naruto to Mashin to ...,1.0


In [62]:
# membuat function agar lebih singkat
def recommend():
    anime = input('Masukkan judul anime yang kamu suka: ')

    cvr = CountVectorizer(tokenizer=lambda x:x.split(', '))
    genre = cvr.fit_transform(df['genre'])

    genre_name = pd.DataFrame(genre.toarray(), columns=cvr.get_feature_names())
    df_genre = pd.concat([df['name'],genre_name], axis=1)

    cos_score = cosine_similarity(df_genre[df_genre['name']==anime].drop('name', axis=1), df_genre.drop('name', axis=1))
    df_cos_score = pd.DataFrame(cos_score, index=['CosScore']).T
    df_anime_cos = pd.concat([df['name'], df_cos_score], axis=1)

    return df_anime_cos[df_anime_cos['name']!=anime].sort_values('CosScore', ascending=False).head()

In [65]:
recommend()

Unnamed: 0,name,CosScore
1103,Boruto: Naruto the Movie - Naruto ga Hokage ni...,1.0
1343,Naruto x UT,1.0
486,Boruto: Naruto the Movie,1.0
615,Naruto: Shippuuden,1.0
2996,Naruto Soyokazeden Movie: Naruto to Mashin to ...,1.0


Dari hasil cosine similarity, anime-anime yang mempunyai kesamaan dengan anime berjudul Naruto dapat dilihat pada table di atas

# Content Based Filtering One User

Recommendation system untuk user berdasarkan genre dan rating anime yang disukai 

Bagaimana Content Based Filtering One User bekerja?  

![image.png](content.png)  

Sebut saja user 1, user tersebut telah memberikan rating kepada sejumlah film dan ada sejumlah film yang belum ditonton sebelumnya. Kita ingin mengetahui film apa yang ingin dia tonton selanjutnya.

![image.png](content2.png)

![image.png](content3.png)  


*data di atas hanya ilustrasi*

In [66]:
# memecah kolom genre
cvr = CountVectorizer(tokenizer=lambda x:x.split(', '))

genre = cvr.fit_transform(df['genre'])

In [67]:
# membuat dataframe berdasarkan pecahan kolom genre
genre_name = pd.DataFrame(genre.toarray(), columns=cvr.get_feature_names())
genre_name.head()

Unnamed: 0,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,game,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0
1,1,1,0,0,0,0,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [68]:
# menggabungkan nama dan rating anime dengan pecahan kolom genre
df_rating_genre = pd.concat([df[['name','rating']],genre_name], axis=1)
df_rating_genre.head()

Unnamed: 0,name,rating,action,adventure,cars,comedy,dementia,demons,drama,ecchi,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,Kimi no Na wa.,9.37,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
1,Fullmetal Alchemist: Brotherhood,9.26,1,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Gintama°,9.25,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Steins;Gate,9.17,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,Gintama&#039;,9.16,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [70]:
# anime recommender menggunakan anime Naruto, One Piece dan Dragon Ball
df_rating = df_rating_genre[df_rating_genre['name'].isin(['Naruto','One Piece','Dragon Ball'])]

In [71]:
# menjumlahkan rating yang diberikan oleh user untuk setiap genre
for i in cvr.get_feature_names():
    df_rating[i] = df_rating['rating']*df_rating[i]

df_rating

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_rating[i] = df_rating['rating']*df_rating[i]


Unnamed: 0,name,rating,action,adventure,cars,comedy,dementia,demons,drama,ecchi,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
74,One Piece,8.58,8.58,8.58,0.0,8.58,0.0,0.0,8.58,0.0,...,0.0,0.0,0.0,0.0,8.58,0.0,0.0,0.0,0.0,0.0
346,Dragon Ball,8.16,0.0,8.16,0.0,8.16,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,8.16,0.0,0.0,0.0,0.0,0.0
841,Naruto,7.81,7.81,0.0,0.0,7.81,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,7.81,0.0,0.0,0.0,0.0,0.0


In [72]:
# User Feature Vector - preferensi user terhadap konten yang ditonton
df_rating.drop(['name','rating'], axis=1, inplace=True)

anime_scoring = df_rating.sum()/df_rating.sum().sum()
anime_scoring

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


action           0.110691
adventure        0.113055
cars             0.000000
comedy           0.165800
dementia         0.000000
demons           0.000000
drama            0.057946
ecchi            0.000000
fantasy          0.113055
game             0.000000
harem            0.000000
hentai           0.000000
historical       0.000000
horror           0.000000
josei            0.000000
kids             0.000000
magic            0.000000
martial arts     0.107854
mecha            0.000000
military         0.000000
music            0.000000
mystery          0.000000
parody           0.000000
police           0.000000
psychological    0.000000
romance          0.000000
samurai          0.000000
school           0.000000
sci-fi           0.000000
seinen           0.000000
shoujo           0.000000
shoujo ai        0.000000
shounen          0.165800
shounen ai       0.000000
slice of life    0.000000
space            0.000000
sports           0.000000
super power      0.165800
supernatural

In [76]:
# membuat dataframe tanpa anime Naruto, One Piece dan Dragon Ball
df_recommend = df_rating_genre[~df_rating_genre['name'].isin(['Naruto','One Piece','Dragon Ball'])].drop('rating', axis=1)
df_recommend.head()

Unnamed: 0,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,Kimi no Na wa.,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
1,Fullmetal Alchemist: Brotherhood,1,1,0,0,0,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
2,Gintama°,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Steins;Gate,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,Gintama&#039;,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [77]:
# mengkalikan anime scoring dengan genre dalam df recommend
for i in cvr.get_feature_names():
    df_recommend[i] = df_recommend[i]*anime_scoring[i]

df_recommend.head()

Unnamed: 0,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,Kimi no Na wa.,0.0,0.0,0.0,0.0,0.0,0.0,0.057946,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Fullmetal Alchemist: Brotherhood,0.110691,0.113055,0.0,0.0,0.0,0.0,0.057946,0.0,0.113055,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Gintama°,0.110691,0.0,0.0,0.1658,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Steins;Gate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Gintama&#039;,0.110691,0.0,0.0,0.1658,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [78]:
# membuat kolom baru bernama anime rating prediction
df_recommend['anime rating prediction'] = df_recommend.sum(axis=1)
df_recommend[['name','anime rating prediction']].head()

Unnamed: 0,name,anime rating prediction
0,Kimi no Na wa.,0.057946
1,Fullmetal Alchemist: Brotherhood,0.560546
2,Gintama°,0.442291
3,Steins;Gate,0.0
4,Gintama&#039;,0.442291


In [79]:
# melihat 5 anime dengan anime rating prediction tertinggi
df_recommend[['name','anime rating prediction']].sort_values('anime rating prediction', ascending=False).head()

Unnamed: 0,name,anime rating prediction
5997,Dragon Ball Z Movie 11: Super Senshi Gekiha!! ...,0.942054
1409,Dragon Ball Z Movie 15: Fukkatsu no F,0.942054
1931,Dragon Ball: Episode of Bardock,0.942054
1930,Dragon Ball Super,0.942054
3407,Dragon Ball Z: Zenbu Misemasu Toshi Wasure Dra...,0.942054


In [82]:
# membuat function agar lebih singkat
def recommendMe():
    anime_list=input('Masukkan judul-judul anime yang anda sukai, pisahkan dengan koma').split(', ')

    cvr = CountVectorizer(tokenizer=lambda x:x.split(', '))

    genre = cvr.fit_transform(df['genre'])
    df_rating_genre = pd.concat([df[['name','rating']],genre_name], axis=1)

    df_rating = df_rating_genre[df_rating_genre['name'].isin(anime_list)]
    for i in cvr.get_feature_names():
        df_rating[i] = df_rating['rating']*df_rating[i]
    
    df_rating.drop(['name','rating'], axis=1, inplace=True)
    anime_scoring = df_rating.sum()/df_rating.sum().sum()

    df_recommend = df_rating_genre[~df_rating_genre['name'].isin(anime_list)].drop('rating', axis=1)

    for i in cvr.get_feature_names():
        df_recommend[i] = df_recommend[i]*anime_scoring[i]

    df_recommend['anime rating prediction'] = df_recommend.sum(axis=1)

    return df_recommend[['name','anime rating prediction']].sort_values('anime rating prediction', ascending=False).head()

In [83]:
recommendMe()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_rating[i] = df_rating['rating']*df_rating[i]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


Unnamed: 0,name,anime rating prediction
5997,Dragon Ball Z Movie 11: Super Senshi Gekiha!! ...,0.942054
1409,Dragon Ball Z Movie 15: Fukkatsu no F,0.942054
1931,Dragon Ball: Episode of Bardock,0.942054
1930,Dragon Ball Super,0.942054
3407,Dragon Ball Z: Zenbu Misemasu Toshi Wasure Dra...,0.942054


Dari hasil content based filtering one user, anime-anime yang mempunyai kesamaan dengan anime-anime berjudul Naruto, One Piece dan Dragon Ball dapat dilihat pada table di atas