## **Latihan Recommendation System (Content Based Filtering)**

**Gunakan dataset anime**

**Content based filtering for one user**

- Drop missing values
- Vectorize untuk mendapatkan tiap nilai genre untuk tiap anime (item-feature matrix with rating)
- Pilih 3 anime yang disukai user (bebas)
- Buat user feature vector
- Cari 10 rekomendasi anime untuk user

## **Import libraries**

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

import warnings
warnings.filterwarnings('ignore')

## **Load dataset**

In [2]:
df = pd.read_csv('anime.csv')
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


## **Preprocessing**

In [3]:
df = df.loc[:, ['anime_id', 'name', 'genre', 'rating']]
df

Unnamed: 0,anime_id,name,genre,rating
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",9.37
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",9.26
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",9.25
3,9253,Steins;Gate,"Sci-Fi, Thriller",9.17
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",9.16
...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,4.15
12290,5543,Under World,Hentai,4.28
12291,5621,Violence Gekiga David no Hoshi,Hentai,4.88
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,4.98


In [4]:
df.isna().sum()

anime_id      0
name          0
genre        62
rating      230
dtype: int64

In [5]:
df = df.dropna()

In [6]:
df.shape

(12017, 4)

In [7]:
df = df.reset_index(drop=True)

# **Content Based Filtering for One User**

## **Create Item-Feature matrix**

In [8]:
# Generate item feature matrix
vect = CountVectorizer(tokenizer=lambda x:x.split(', '))

df_genre = vect.fit_transform(df['genre'])
df_genre = pd.DataFrame(df_genre.toarray(), columns=vect.get_feature_names())
df_genre = pd.concat([df[['anime_id', 'name', 'rating']], df_genre], axis=1)
df_genre 

Unnamed: 0,anime_id,name,rating,action,adventure,cars,comedy,dementia,demons,drama,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,32281,Kimi no Na wa.,9.37,0,0,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0
1,5114,Fullmetal Alchemist: Brotherhood,9.26,1,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,28977,Gintama°,9.25,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,9253,Steins;Gate,9.17,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,9969,Gintama&#039;,9.16,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,9316,Toushindai My Lover: Minami tai Mecha-Minami,4.15,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12013,5543,Under World,4.28,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12014,5621,Violence Gekiga David no Hoshi,4.88,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12015,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,4.98,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## **Select 3 anime you like**

Memilih 3 anime (items) yang paling disukai. 
Recommender System nanti akan memberikan rekomendasi anime yang paling serupa dengan 3 anime yang paling disukai ini.

In [62]:
# Lihat-lihat nama anime dan genre
search = 'Doraemon'
  
result = df['name'].str.contains(search) 
df[result].head(30)

Unnamed: 0,anime_id,name,genre,rating
399,21469,Stand By Me Doraemon,"Comedy, Kids, Sci-Fi, Shounen",8.12
929,2471,Doraemon (1979),"Adventure, Comedy, Fantasy, Kids, Sci-Fi, Shounen",7.76
993,10534,Doraemon Movie 31: Shin Nobita to Tetsujin Hei...,"Adventure, Comedy, Fantasy, Kids, Shounen",7.73
1160,2673,Doraemon Movie 27: Nobita no Shin Makai Daibou...,"Adventure, Comedy, Fantasy",7.65
1309,8687,Doraemon (2005),"Comedy, Kids, Sci-Fi, Shounen",7.59
1354,7045,Doraemon Movie 21: Nobita no Taiyou Ou Densetsu,"Adventure, Comedy, Fantasy, Historical, Kids, ...",7.57
1378,2392,Doraemon Movie 26: Nobita no Kyouryuu 2006,"Adventure, Comedy, Kids",7.56
1443,5096,Doraemon Movie 28: Nobita to Midori no Kyojin Den,"Adventure, Comedy, Fantasy, Kids, Shounen",7.54
1526,2672,Doraemon Movie 05: Nobita no Makai Daibouken,Fantasy,7.51
1587,501,Doraemon,"Adventure, Comedy, Fantasy, Kids, Shounen",7.49


In [10]:
# anime yang sudah ditonton
df_choice = df_genre[df_genre['name'].isin(['Naruto', 'One Piece', 'Dragon Ball'])]
df_choice 

Unnamed: 0,anime_id,name,rating,action,adventure,cars,comedy,dementia,demons,drama,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
74,21,One Piece,8.58,1,1,0,1,0,0,1,...,0,0,0,0,1,0,0,0,0,0
346,223,Dragon Ball,8.16,0,1,0,1,0,0,0,...,0,0,0,0,1,0,0,0,0,0
841,20,Naruto,7.81,1,0,0,1,0,0,0,...,0,0,0,0,1,0,0,0,0,0


## **Item-feature matrix with rating**

In [11]:
# mengganti isi tiap kolom, dari nilai 1 menjadi ratingnya 
for i in vect.get_feature_names():
    df_choice[i] = df_choice['rating'] * df_choice[i]

df_choice

Unnamed: 0,anime_id,name,rating,action,adventure,cars,comedy,dementia,demons,drama,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
74,21,One Piece,8.58,8.58,8.58,0.0,8.58,0.0,0.0,8.58,...,0.0,0.0,0.0,0.0,8.58,0.0,0.0,0.0,0.0,0.0
346,223,Dragon Ball,8.16,0.0,8.16,0.0,8.16,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,8.16,0.0,0.0,0.0,0.0,0.0
841,20,Naruto,7.81,7.81,0.0,0.0,7.81,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,7.81,0.0,0.0,0.0,0.0,0.0


## **Calculating user feature vector**

In [12]:
# jumlahkan nilai pada masing-masing kolom/genre (content)
df_choice[vect.get_feature_names()].sum()

action           16.39
adventure        16.74
cars              0.00
comedy           24.55
dementia          0.00
demons            0.00
drama             8.58
ecchi             0.00
fantasy          16.74
game              0.00
harem             0.00
hentai            0.00
historical        0.00
horror            0.00
josei             0.00
kids              0.00
magic             0.00
martial arts     15.97
mecha             0.00
military          0.00
music             0.00
mystery           0.00
parody            0.00
police            0.00
psychological     0.00
romance           0.00
samurai           0.00
school            0.00
sci-fi            0.00
seinen            0.00
shoujo            0.00
shoujo ai         0.00
shounen          24.55
shounen ai        0.00
slice of life     0.00
space             0.00
sports            0.00
super power      24.55
supernatural      0.00
thriller          0.00
vampire           0.00
yaoi              0.00
yuri              0.00
dtype: floa

In [13]:
# normalisasi nilainya agar totalnya menjadi 100%
user_feature_vector = df_choice[vect.get_feature_names()].sum()/df_choice[vect.get_feature_names()].sum().sum()
user_feature_vector

action           0.110691
adventure        0.113055
cars             0.000000
comedy           0.165800
dementia         0.000000
demons           0.000000
drama            0.057946
ecchi            0.000000
fantasy          0.113055
game             0.000000
harem            0.000000
hentai           0.000000
historical       0.000000
horror           0.000000
josei            0.000000
kids             0.000000
magic            0.000000
martial arts     0.107854
mecha            0.000000
military         0.000000
music            0.000000
mystery          0.000000
parody           0.000000
police           0.000000
psychological    0.000000
romance          0.000000
samurai          0.000000
school           0.000000
sci-fi           0.000000
seinen           0.000000
shoujo           0.000000
shoujo ai        0.000000
shounen          0.165800
shounen ai       0.000000
slice of life    0.000000
space            0.000000
sports           0.000000
super power      0.165800
supernatural

## **Create recommendation**

Kalikan hasil user feature vector dengan semua list anime, sehingga nanti tiap judul anime, genre-nya memiliki nilai user feature vector berdasarkan anime Naruto, One Piece, dan Dragon Ball.

In [14]:
# Dataframe tanpa Naruto, One Piece, Dragon Ball (anime yang belum ditonton)
df_recommend = df_genre.loc[~df_genre['name'].isin(['Naruto', 'One Piece', 'Dragon Ball'])] 

In [15]:
# nilai pada tiap genre dikalikan dengan nilai dari User-Feature-Vector 
for i in vect.get_feature_names():
    df_recommend[i]= df_recommend[i] * user_feature_vector[i]
    
df_recommend

Unnamed: 0,anime_id,name,rating,action,adventure,cars,comedy,dementia,demons,drama,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,32281,Kimi no Na wa.,9.37,0.000000,0.000000,0.0,0.0000,0.0,0.0,0.057946,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,5114,Fullmetal Alchemist: Brotherhood,9.26,0.110691,0.113055,0.0,0.0000,0.0,0.0,0.057946,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,28977,Gintama°,9.25,0.110691,0.000000,0.0,0.1658,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,9253,Steins;Gate,9.17,0.000000,0.000000,0.0,0.0000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,9969,Gintama&#039;,9.16,0.110691,0.000000,0.0,0.1658,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,9316,Toushindai My Lover: Minami tai Mecha-Minami,4.15,0.000000,0.000000,0.0,0.0000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12013,5543,Under World,4.28,0.000000,0.000000,0.0,0.0000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12014,5621,Violence Gekiga David no Hoshi,4.88,0.000000,0.000000,0.0,0.0000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12015,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,4.98,0.000000,0.000000,0.0,0.0000,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### **Create predicted score column**

In [16]:
# Predicted score = total penjumlahan nilai suatu film
df_recommend['predicted_score'] = df_recommend.drop(columns=['rating', 'anime_id']).sum(axis=1)
df_recommend.head()

Unnamed: 0,anime_id,name,rating,action,adventure,cars,comedy,dementia,demons,drama,...,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri,predicted_score
0,32281,Kimi no Na wa.,9.37,0.0,0.0,0.0,0.0,0.0,0.0,0.057946,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057946
1,5114,Fullmetal Alchemist: Brotherhood,9.26,0.110691,0.113055,0.0,0.0,0.0,0.0,0.057946,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.560546
2,28977,Gintama°,9.25,0.110691,0.0,0.0,0.1658,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.442291
3,9253,Steins;Gate,9.17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,9969,Gintama&#039;,9.16,0.110691,0.0,0.0,0.1658,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.442291


In [17]:
# 20 anime dengan score terbesar (film yang paling direkomendasikan)
df_recommend[['name', 'predicted_score']].sort_values('predicted_score', ascending=False).head(20)

Unnamed: 0,name,predicted_score
5997,Dragon Ball Z Movie 11: Super Senshi Gekiha!! ...,0.942054
1409,Dragon Ball Z Movie 15: Fukkatsu no F,0.942054
1931,Dragon Ball: Episode of Bardock,0.942054
1930,Dragon Ball Super,0.942054
3407,Dragon Ball Z: Zenbu Misemasu Toshi Wasure Dra...,0.942054
3202,Dragon Ball Z: Summer Vacation Special,0.942054
4312,Dragon Ball GT: Goku Gaiden! Yuuki no Akashi w...,0.942054
4273,Dragon Ball Z: Atsumare! Gokuu World,0.942054
588,Dragon Ball Kai,0.942054
206,Dragon Ball Z,0.942054


### **Interpretasi**

Jadi, berdasarkan 3 pilihan anime, yaitu Naruto, One Piece, dan Dragon Ball, 20 rekomendasi anime teratas (berdasarkan predicted score-nya) tercantum pada dataframe di atas, yang mana hasilnya kebanyakan adalah varian atau episode lain dari Dragon Ball dan One Piece.

In [18]:
# All codes above as a function
def recommend_me():
    anime = input('Masukan judul anime yang kamu sukai, pisahkan dengan tanda koma').split(', ')
    
    import pandas as pd
    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    
    # Create genre dataframe
    df = pd.read_csv('anime.csv')
    df = df.loc[:, ['anime_id', 'name', 'genre', 'rating']].dropna().reset_index(drop=True)
    
    # Preprocessing
    vect = CountVectorizer(tokenizer=lambda x:x.split(', '))
    df_genre = vect.fit_transform(df['genre'])
    df_genre = pd.DataFrame(df_genre.toarray(), columns=vect.get_feature_names())
    df_genre = pd.concat([df[['anime_id', 'name', 'rating']], df_genre],axis=1)
    
    # User feature vector
    df_choice = df_genre[df_genre['name'].isin(anime)]
    
    for i in vect.get_feature_names():
        df_choice[i]= df_choice['rating']*df_choice[i]
    
    user_feature_vector = df_choice[vect.get_feature_names()].sum()/df_choice[vect.get_feature_names()].sum().sum()
    
    # Predict score
    df_recommend = df_genre[~df_genre['name'].isin(anime)].drop(columns='rating')
    
    for i in vect.get_feature_names():
        df_recommend[i] = df_recommend[i]*user_feature_vector[i]
        
    df_recommend['predicted_score'] = df_recommend.drop(columns=['anime_id']).sum(axis=1)
    
    return df_recommend[['name', 'predicted_score']].sort_values('predicted_score', ascending=False).head(20)

In [19]:
# recommend_me()

# **Cosine Similarity**

In [20]:
from sklearn.metrics.pairwise import cosine_distances

In [21]:
# Anime yang sudah ditonton: Kimi No Na Wa
df_genre.iloc[:1, :]

Unnamed: 0,anime_id,name,rating,action,adventure,cars,comedy,dementia,demons,drama,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,32281,Kimi no Na wa.,9.37,0,0,0,0,0,0,1,...,0,0,0,0,0,1,0,0,0,0


In [22]:
# Item-Feature marix (1x43)
my_anime = df_genre.iloc[:1, 3:]
my_anime

Unnamed: 0,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,game,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0


In [23]:
# Menghitung cosine 
dist = cosine_distances(my_anime, df_genre.iloc[: , 3:])
dist

array([[0.        , 0.81101776, 1.        , ..., 1.        , 1.        ,
        1.        ]])

In [66]:
top10_index = dist.argsort()[0, 0:11]
top10_index

array([   0, 5803, 6391, 1111, 5029, 3529, 4512, 5125, 5231, 1201,  208],
      dtype=int64)

In [67]:
# 10 anime yang akan direkomendasikan (paling mirip dengan Kimi No Nawa)
df_genre.loc[top10_index]

Unnamed: 0,anime_id,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,32281,Kimi no Na wa.,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
5803,547,Wind: A Breath of Heart OVA,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
6391,546,Wind: A Breath of Heart (TV),0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
1111,14669,Aura: Maryuuin Kouga Saigo no Tatakai,0,0,0,1,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
5029,1039,Mizuiro (2003),0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
3529,9988,Otome wa Boku ni Koishiteru: Futari no Elder,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4512,2105,Touka Gettan,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
5125,1607,Venus Versus Virus,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
5231,1624,To Heart 2 Special,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1201,10067,Angel Beats!: Another Epilogue,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0


# **Content Based Filtering for Multiple Users**

**Rekomendasikan masing-masing 10 anime untuk setiap user**

Buatlah:
- Item feature: anime_id, name, genre
- Buat user-item rating matrix di mana isinya adalah 4 user dan 4 anime yang sudah diberi rating oleh keempat user tersebut (buat secara random saja)
        
        Contoh:
        
        - 'user': ['user 1', 'user 2', 'user 3', 'user 4'],
        - 26055: [7, 8, 8, 9],
        - 11061: [9, 7, 8, 7],
        - 2001: [7, 7, 9, 7],
        - 31933: [8, 9, 7, 8]
            
- Buat item-feature matrix untuk semua anime dan pilih 4 anime yang sudah diberi rating di atas
- Buat user feature matrix dan user feature vector-nya
- Buat rekomendasi untuk anime yang belum ditonton
- Sort dan filtering 10 rekomendasi anime untuk tiap user

In [26]:
search = 'Hunter x Hunter'
  
result = df['name'].str.contains(search) 
df[result] 

Unnamed: 0,anime_id,name,genre,rating
6,11061,Hunter x Hunter (2011),"Action, Adventure, Shounen, Super Power",9.13
112,136,Hunter x Hunter,"Action, Adventure, Shounen, Super Power",8.48
145,137,Hunter x Hunter OVA,"Action, Adventure, Shounen, Super Power",8.41
146,139,Hunter x Hunter: Greed Island Final,"Action, Adventure, Shounen, Super Power",8.41
202,138,Hunter x Hunter: Greed Island,"Action, Adventure, Shounen, Super Power",8.33
1974,13271,Hunter x Hunter Movie: Phantom Rouge,"Action, Adventure, Shounen, Super Power",7.39
2046,10189,Hunter x Hunter Pilot,"Action, Adventure, Shounen, Super Power",7.37
2108,19951,Hunter x Hunter Movie: The Last Mission,"Action, Adventure, Shounen, Super Power",7.35


In [27]:
search = 'Ultra'
  
result = df['name'].str.contains(search) 
df[result] 

Unnamed: 0,anime_id,name,genre,rating
2313,178,Ultra Maniac,"Comedy, Magic, Romance, School, Shoujo",7.3
4041,179,Ultra Maniac OVA,"Comedy, Magic, Romance, School",6.82
5187,3444,The Ultraman,"Action, Sci-Fi, Shounen, Space, Super Power",6.53
5501,6731,Ultraman Kids no Kotowaza Monogatari,"Comedy, Kids, Super Power",6.44
6499,4264,Ultraviolet: Code 044,"Action, Sci-Fi",6.09
6540,12851,Ultraman USA,"Action, Adventure, Sci-Fi",6.07
7042,33011,Kaijuu Girls: Ultra Kaijuu Gijinka Keikaku,"Comedy, Fantasy, Parody",5.99
8868,21549,Hitotsuboshi-ke no Ultra Baasan,"Comedy, Slice of Life",4.88
10606,10250,Ultra B,"Comedy, Super Power",5.52
10607,10244,Ultra B: Black Hole kara no Dokusaisha BB!!,"Adventure, Comedy",5.54


In [28]:
search = 'Super Power'
  
result = df['genre'].str.contains(search) 
df[result].head(30)

Unnamed: 0,anime_id,name,genre,rating
6,11061,Hunter x Hunter (2011),"Action, Adventure, Shounen, Super Power",9.13
13,2904,Code Geass: Hangyaku no Lelouch R2,"Action, Drama, Mecha, Military, Sci-Fi, Super ...",8.98
19,1575,Code Geass: Hangyaku no Lelouch,"Action, Mecha, Military, School, Sci-Fi, Super...",8.83
23,30276,One Punch Man,"Action, Comedy, Parody, Sci-Fi, Seinen, Super ...",8.82
55,4565,Tengen Toppa Gurren Lagann Movie: Lagann-hen,"Action, Mecha, Sci-Fi, Space, Super Power",8.64
74,21,One Piece,"Action, Adventure, Comedy, Drama, Fantasy, Sho...",8.58
86,16498,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",8.54
112,136,Hunter x Hunter,"Action, Adventure, Shounen, Super Power",8.48
145,137,Hunter x Hunter OVA,"Action, Adventure, Shounen, Super Power",8.41
146,139,Hunter x Hunter: Greed Island Final,"Action, Adventure, Shounen, Super Power",8.41


In [29]:
df[df['anime_id'] == 11061]

Unnamed: 0,anime_id,name,genre,rating
6,11061,Hunter x Hunter (2011),"Action, Adventure, Shounen, Super Power",9.13


## **Create a copy of df**
Kita mulai 

In [30]:
df_multiple = df.copy()

### **Choose only anime_id, name, genre columns and drop missing values**
Kolom yang diperlukan

In [31]:
df_multiple = df_multiple[['anime_id', 'name', 'genre']].dropna().reset_index(drop=True)
df_multiple

Unnamed: 0,anime_id,name,genre
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural"
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili..."
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S..."
3,9253,Steins;Gate,"Sci-Fi, Thriller"
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S..."
...,...,...,...
12012,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai
12013,5543,Under World,Hentai
12014,5621,Violence Gekiga David no Hoshi,Hentai
12015,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai


## **Create User-Item Rating matrix**
Buat dataframe berisi 4 user dengan 4 anime yang (anggap saja) sudah ditonton dan diberi rating oleh masing masing user.

In [32]:
df_user_item = pd.DataFrame({
    'user': ['user 1', 'user 2', 'user 3', 'user 4'],
    26055: [7, 8, 8, 9],
    11061: [9, 7, 8, 7],
    2001: [7, 7, 9, 7],
    31933: [8, 9, 7, 8]
})

df_user_item

Unnamed: 0,user,26055,11061,2001,31933
0,user 1,7,9,7,8
1,user 2,8,7,7,9
2,user 3,8,8,9,7
3,user 4,9,7,7,8


## **Create Item-Feature matrix**
Matix berisi nama anime dan genrenya

In [33]:
vect = CountVectorizer(tokenizer=lambda x:x.split(', '))

df_genre = vect.fit_transform(df_multiple['genre'])
df_genre = pd.DataFrame(df_genre.toarray(), columns=vect.get_feature_names())
df_genre = pd.concat([df_multiple[['anime_id', 'name']], df_genre], axis=1)
df_genre 

Unnamed: 0,anime_id,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,32281,Kimi no Na wa.,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
1,5114,Fullmetal Alchemist: Brotherhood,1,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,28977,Gintama°,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,9253,Steins;Gate,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,9969,Gintama&#039;,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,9316,Toushindai My Lover: Minami tai Mecha-Minami,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12013,5543,Under World,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12014,5621,Violence Gekiga David no Hoshi,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12015,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
df_user_item.columns[1:]

Index([26055, 11061, 2001, 31933], dtype='object')

In [35]:
# Item-Feature matrix untuk anime yang sudah ditonton

df_item_feature = df_genre.loc[df_genre['anime_id'].isin(df_user_item.columns[1:])]
df_item_feature

Unnamed: 0,anime_id,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
6,11061,Hunter x Hunter (2011),1,1,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
29,2001,Tengen Toppa Gurren Lagann,1,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
64,26055,JoJo no Kimyou na Bouken: Stardust Crusaders 2...,1,1,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
76,31933,JoJo no Kimyou na Bouken: Diamond wa Kudakenai,1,1,0,1,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0


## **Create user feature vector**

User Feature Vector menunjukkan preferensi dari masing masing user terhadap genre-genre tertentu

In [36]:
user_item = np.array(df_user_item.drop(columns='user'))                         # User-Item matrix dalam bentuk array
item_feature = np.array(df_item_feature.drop(columns=['anime_id', 'name']))     # Item-Feature marix dalam bentuk array

n_user = user_item.shape[0]         # jumlah user 
n_item = user_item.shape[1]         # jumlah item (anime)
n_feature = item_feature.shape[1]   # jumlah feature/content (genre)

user_feature = np.empty((n_user, n_feature))  # membuat array 4x43 dengan isi yang tidak ditentukan/tidak terisi (4 baris 43 kolom)

for i in range(n_user):
    user_feature_vector = np.matmul(user_item[i,:], item_feature)         # perkalian matrix: User-Item array 1D (4 kolom) dikali Item-Feature matrix (4x43) hasilnya marix (1x43)
    user_feature_vector = user_feature_vector/user_feature_vector.sum()   # normalisasi nilainya sehingga total nilainya menjadi 1
    user_feature[i, :] = user_feature_vector                              # baris ke-i diisi oleh vector dari user ke-i (user_feature_vector)

In [70]:
# user_feature = np.empty((n_user, n_feature))
# user_feature

In [73]:
# user_item[0,:]

In [75]:
# item_feature

In [77]:
# user_feature_vector = np.matmul(user_item[0,:], item_feature)
# user_feature_vector

In [41]:
# user_feature_vector.sum()

In [42]:
# variabel user)feature yang tadinya berupa np.empty sekarang sudah terisi dengan user_feature_vector (43 kolom) dari masing-masing user (4 baris)
user_feature

array([[0.19871795, 0.19871795, 0.        , 0.10897436, 0.        ,
        0.        , 0.09615385, 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.05769231, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.05769231, 0.        ,
        0.        , 0.        , 0.14102564, 0.        , 0.        ,
        0.        , 0.        , 0.04487179, 0.09615385, 0.        ,
        0.        , 0.        , 0.        ],
       [0.19871795, 0.19871795, 0.        , 0.1025641 , 0.        ,
        0.        , 0.1025641 , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.04487179, 0.        ,
        0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        , 0.        , 0.04487179, 0.        ,
   

In [43]:
# user_feature matrix di atas kita jadikan DataFrame
df_user_vector = pd.DataFrame(user_feature, index=df_user_item['user'], columns=df_item_feature.columns[2:])
df_user_vector 

Unnamed: 0_level_0,action,adventure,cars,comedy,dementia,demons,drama,ecchi,fantasy,game,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
user 1,0.198718,0.198718,0.0,0.108974,0.0,0.0,0.096154,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.044872,0.096154,0.0,0.0,0.0,0.0
user 2,0.198718,0.198718,0.0,0.102564,0.0,0.0,0.102564,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.051282,0.102564,0.0,0.0,0.0,0.0
user 3,0.201258,0.201258,0.0,0.09434,0.0,0.0,0.100629,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.050314,0.100629,0.0,0.0,0.0,0.0
user 4,0.201299,0.201299,0.0,0.097403,0.0,0.0,0.097403,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.058442,0.097403,0.0,0.0,0.0,0.0


Sekarang kita sudah tau preferensi dari masing-masing user

## **Filter dataframe without 4 chosen anime**

Ini adalah anime-anime yang belum pernah ditonton oleh 4 user tersebut. 

In [44]:
# Anime yang belum ditonton
df_item_feature_new = df_genre.loc[~df_genre['anime_id'].isin(df_user_item.columns[1:])]
df_item_feature_new

Unnamed: 0,anime_id,name,action,adventure,cars,comedy,dementia,demons,drama,ecchi,...,shounen ai,slice of life,space,sports,super power,supernatural,thriller,vampire,yaoi,yuri
0,32281,Kimi no Na wa.,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
1,5114,Fullmetal Alchemist: Brotherhood,1,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,28977,Gintama°,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,9253,Steins;Gate,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,9969,Gintama&#039;,1,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12012,9316,Toushindai My Lover: Minami tai Mecha-Minami,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12013,5543,Under World,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12014,5621,Violence Gekiga David no Hoshi,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12015,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## **Create predicted score for unwatched anime**

Mengalikan nilai pada User Fature Vector (preferensi tiap user) dengan anime yang belum ditonton

In [45]:
# Item-Feature marix (2D array) dari anime yang belum ditonton
item_feature_new = np.array(df_item_feature_new.drop(columns=['anime_id', 'name']))

n_item_new = item_feature_new.shape[0]                  # jumlah item (anime) yang belum ditonton

user_item_score_new = np.empty((n_user, n_item_new))    # marix 4 x 12013

for i in range(n_user):
    user_item_score = np.matmul(item_feature_new, user_feature[i,:])  # perkalian: Item-Feature matrix (12013x43) dikali User-Feature 1D array (43 kolom)
    user_item_score_new[i,:] = user_item_score                        # baris ke-i diisi oleh vector dari user ke-i               

In [46]:
user_item_score_new[0,:].shape

(12013,)

In [47]:
item_feature_new.shape

(12013, 43)

In [48]:
user_feature[0,:]

array([0.19871795, 0.19871795, 0.        , 0.10897436, 0.        ,
       0.        , 0.09615385, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.05769231, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.05769231, 0.        ,
       0.        , 0.        , 0.14102564, 0.        , 0.        ,
       0.        , 0.        , 0.04487179, 0.09615385, 0.        ,
       0.        , 0.        , 0.        ])

In [49]:
user_item_score_new.shape

(4, 12013)

In [50]:
# User-Item Score (4x12013) ditranspose menjadi (12013x4)  
df_score = pd.DataFrame(user_item_score_new).T

In [51]:
# Rename the columns
df_score.columns = df_user_item['user']

In [52]:
df_score

user,user 1,user 2,user 3,user 4
0,0.192308,0.205128,0.201258,0.194805
1,0.634615,0.653846,0.654088,0.655844
2,0.506410,0.500000,0.496855,0.500000
3,0.057692,0.044872,0.050314,0.045455
4,0.506410,0.500000,0.496855,0.500000
...,...,...,...,...
12008,0.000000,0.000000,0.000000,0.000000
12009,0.000000,0.000000,0.000000,0.000000
12010,0.000000,0.000000,0.000000,0.000000
12011,0.000000,0.000000,0.000000,0.000000


In [53]:
# Merge dataframes
df_score = pd.concat([df_item_feature_new[['anime_id', 'name']].reset_index(drop=True), df_score], axis=1)
df_score.head()

Unnamed: 0,anime_id,name,user 1,user 2,user 3,user 4
0,32281,Kimi no Na wa.,0.192308,0.205128,0.201258,0.194805
1,5114,Fullmetal Alchemist: Brotherhood,0.634615,0.653846,0.654088,0.655844
2,28977,Gintama°,0.50641,0.5,0.496855,0.5
3,9253,Steins;Gate,0.057692,0.044872,0.050314,0.045455
4,9969,Gintama&#039;,0.50641,0.5,0.496855,0.5


## **Recommendation for users**
Urutkan anime yang paling direkomendasikan

In [54]:
# Recommendation for user 1
df_score[['anime_id', 'name', 'user 1']].sort_values('user 1', ascending=False).head(10)

Unnamed: 0,anime_id,name,user 1
7702,1626,Genma Taisen,0.858974
2550,1397,Macross 7,0.858974
4224,1136,Betterman,0.858974
3208,1400,Macross 7 Movie: Ginga ga Ore wo Yondeiru!,0.858974
1758,573,Saber Marionette J,0.858974
799,154,Shaman King,0.839744
4035,23067,Tenkai Knights,0.807692
5161,3114,Chiisana Kyojin Microman,0.807692
2903,1022,Generator Gawl,0.801282
1626,2408,Keroro Gunsou Movie 2: Shinkai no Princess de ...,0.801282


In [55]:
# Recommendation for user 2
df_score[['anime_id', 'name', 'user 2']].sort_values('user 2', ascending=False).head(10)

Unnamed: 0,anime_id,name,user 2
799,154,Shaman King,0.858974
1758,573,Saber Marionette J,0.846154
2550,1397,Macross 7,0.846154
3208,1400,Macross 7 Movie: Ginga ga Ore wo Yondeiru!,0.846154
4224,1136,Betterman,0.846154
7702,1626,Genma Taisen,0.846154
227,19123,One Piece: Episode of Merry - Mou Hitori no Na...,0.807692
71,21,One Piece,0.807692
237,15323,One Piece: Episode of Nami - Koukaishi no Nami...,0.807692
892,31289,One Piece: Episode of Sabo - 3 Kyoudai no Kizu...,0.807692


In [56]:
# Recommendation for user 3
df_score[['anime_id', 'name', 'user 3']].sort_values('user 3', ascending=False).head(10)

Unnamed: 0,anime_id,name,user 3
7702,1626,Genma Taisen,0.849057
2550,1397,Macross 7,0.849057
799,154,Shaman King,0.849057
4224,1136,Betterman,0.849057
1758,573,Saber Marionette J,0.849057
3208,1400,Macross 7 Movie: Ginga ga Ore wo Yondeiru!,0.849057
4035,23067,Tenkai Knights,0.798742
71,21,One Piece,0.798742
3316,1186,Battle Athletess Daiundoukai (TV),0.798742
2903,1022,Generator Gawl,0.798742


In [57]:
# Recommendation for user 4
df_score[['anime_id', 'name', 'user 4']].sort_values('user 4', ascending=False).head(10)

Unnamed: 0,anime_id,name,user 4
799,154,Shaman King,0.850649
3208,1400,Macross 7 Movie: Ginga ga Ore wo Yondeiru!,0.844156
1758,573,Saber Marionette J,0.844156
2550,1397,Macross 7,0.844156
7702,1626,Genma Taisen,0.844156
4224,1136,Betterman,0.844156
71,21,One Piece,0.811688
892,31289,One Piece: Episode of Sabo - 3 Kyoudai no Kizu...,0.811688
237,15323,One Piece: Episode of Nami - Koukaishi no Nami...,0.811688
227,19123,One Piece: Episode of Merry - Mou Hitori no Na...,0.811688
