# 25 October 2021

## **Latihan Collaborative Filtering**

**Dengan menggunakan dataset anime & rating, buatlah recommendation system dengan skema berikut:**

* Gabungkan kedua data agar dapat memunculkan informasi-informasi yang ada pada dataset anime.
* Bandingkan algoritma SVD dan ALS
* Tuning algoritma yang menurut kalian lebih baik

Setelah mendapatkan model terbaik, coba prediksi rating anime berikut:

* Hunter x Hunter (2011), anime_id 11061
* Detective Conan OVA 09, anime_id 2514
* Ranma ½, anime_id 1010
* Saint Seiya: Meiou Hades Juuni Kyuu-hen, anime_id 1257 

Oleh user:

* 50
* 200
* 400
* 800

Bagaimana urutan rekomendasi yang akan kalian berikan untuk masing-masing user?

## **Import libraries**

In [1]:
import pandas as pd
import seaborn as sns

from surprise import Reader
from surprise import Dataset

from surprise import SVD
from surprise import BaselineOnly

from surprise import accuracy
from surprise.model_selection import cross_validate, train_test_split
from surprise.model_selection import GridSearchCV

## **Load dataset & preprocessing**

In [2]:
df = pd.read_csv('rating.csv')
df

Unnamed: 0.1,Unnamed: 0,user_id,anime_id,rating
0,47,1,8074,10.0
1,81,1,11617,10.0
2,83,1,11757,10.0
3,101,1,15451,10.0
4,153,2,11771,10.0
...,...,...,...,...
77863,96433,999,11757,6.0
77864,96434,999,16498,9.0
77865,96435,999,21881,5.0
77866,96436,999,22319,8.0


In [3]:
# Drop unwanted column
df = df.drop(columns='Unnamed: 0', axis=1)
df.head(10)

Unnamed: 0,user_id,anime_id,rating
0,1,8074,10.0
1,1,11617,10.0
2,1,11757,10.0
3,1,15451,10.0
4,2,11771,10.0
5,3,20,8.0
6,3,154,6.0
7,3,170,9.0
8,3,199,10.0
9,3,225,9.0


In [4]:
df_anime = pd.read_csv('anime.csv')
df_anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [5]:
df_merged = pd.merge(df, df_anime, how='left', on=['anime_id'])
df_merged

Unnamed: 0,user_id,anime_id,rating_x,name,genre,type,episodes,rating_y,members
0,1,8074,10.0,Highschool of the Dead,"Action, Ecchi, Horror, Supernatural",TV,12,7.46,535892
1,1,11617,10.0,High School DxD,"Comedy, Demons, Ecchi, Harem, Romance, School",TV,12,7.70,398660
2,1,11757,10.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",TV,25,7.83,893100
3,1,15451,10.0,High School DxD New,"Action, Comedy, Demons, Ecchi, Harem, Romance,...",TV,12,7.87,266657
4,2,11771,10.0,Kuroko no Basket,"Comedy, School, Shounen, Sports",TV,25,8.46,338315
...,...,...,...,...,...,...,...,...,...
77863,999,11757,6.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",TV,25,7.83,893100
77864,999,16498,9.0,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",TV,25,8.54,896229
77865,999,21881,5.0,Sword Art Online II,"Action, Adventure, Fantasy, Game, Romance",TV,24,7.35,537892
77866,999,22319,8.0,Tokyo Ghoul,"Action, Drama, Horror, Mystery, Psychological,...",TV,12,8.07,618056


In [6]:
df_merged = df_merged.drop(columns=['type', 'episodes', 'rating_y', 'members'], axis=1)
df_merged = df_merged.rename(columns={'rating_x':'user_rating'})
df_merged

Unnamed: 0,user_id,anime_id,user_rating,name,genre
0,1,8074,10.0,Highschool of the Dead,"Action, Ecchi, Horror, Supernatural"
1,1,11617,10.0,High School DxD,"Comedy, Demons, Ecchi, Harem, Romance, School"
2,1,11757,10.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance"
3,1,15451,10.0,High School DxD New,"Action, Comedy, Demons, Ecchi, Harem, Romance,..."
4,2,11771,10.0,Kuroko no Basket,"Comedy, School, Shounen, Sports"
...,...,...,...,...,...
77863,999,11757,6.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance"
77864,999,16498,9.0,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power"
77865,999,21881,5.0,Sword Art Online II,"Action, Adventure, Fantasy, Game, Romance"
77866,999,22319,8.0,Tokyo Ghoul,"Action, Drama, Horror, Mystery, Psychological,..."


## **Modeling**

In [7]:
user_item_rating_matrix = df_merged.pivot_table(values='user_rating', index ='user_id', columns ='anime_id')

In [8]:
reader = Reader(rating_scale=(0, 10))
data = Dataset.load_from_df(df_merged[['user_id', 'anime_id', 'user_rating']], reader)

### **Validation**

In [9]:
trainset, testset = train_test_split(data, test_size=0.2, random_state=1)

### **SVD**

In [10]:
algo_svd = SVD()

algo_svd.fit(trainset)
prediction_svd = algo_svd.test(testset)

In [11]:
accuracy.rmse(prediction_svd)

RMSE: 1.2071


1.2071363733090659

### **ALS**

In [12]:
bsl_options = {'method': 'als',
               'n_epochs': 10,
               'reg_u': 15,
               'reg_i': 10
               }
algo_als = BaselineOnly(bsl_options=bsl_options)

algo_als.fit(trainset)
prediction_als = algo_als.test(testset)

Estimating biases using als...


In [13]:
accuracy.rmse(prediction_als)

RMSE: 1.2128


1.2127696615627046

## **Cross Validation**

### **SVD**

In [14]:
cv_svd = cross_validate(algo_svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2012  1.1893  1.2106  1.2228  1.2047  1.2057  0.0110  
MAE (testset)     0.9107  0.9034  0.9178  0.9252  0.9158  0.9146  0.0073  
Fit time          3.57    3.50    3.65    3.96    4.09    3.76    0.23    
Test time         0.18    0.10    0.13    0.19    0.12    0.14    0.03    


In [15]:
print('RMSE cv mean', cv_svd['test_rmse'].mean())

RMSE cv mean 1.2057367102537007


### **ALS**

In [16]:
cv_als = cross_validate(algo_als, data, measures=['RMSE','MAE'], cv=5, verbose=False)

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...


In [17]:
print('RMSE cv mean', cv_als['test_rmse'].mean())

RMSE cv mean 1.2130867872330853


## **Hyperparameter tuning**

In [18]:
hyperparam_space = {
    'n_epochs':[5, 10, 20, 30], 
    'lr_all':[0.002, 0.005],
    'reg_all':[0.02, 0.4, 0.6] 
}

grid_search = GridSearchCV(SVD, hyperparam_space, measures=['rmse', 'mae'], cv=5)

grid_search.fit(data)

In [19]:
print('RMSE')
print(grid_search.best_score['rmse'])
print(grid_search.best_params['rmse'])
print('\nMAE')
print(grid_search.best_score['mae'])
print(grid_search.best_params['mae'])

RMSE
1.2043158700397465
{'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.02}

MAE
0.9125449864650055
{'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.02}


In [21]:
svd_tuned = SVD(n_epochs = 20, lr_all = 0.005, reg_all = 0.02)
cv_svd_tuned = cross_validate(svd_tuned, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2035  1.2010  1.2149  1.2057  1.2067  1.2064  0.0047  
MAE (testset)     0.9107  0.9147  0.9163  0.9197  0.9148  0.9152  0.0029  
Fit time          4.27    3.95    4.07    3.86    3.88    4.00    0.15    
Test time         0.24    0.13    0.12    0.21    0.11    0.16    0.05    


In [22]:
print('RMSE cv mean before tuning:', cv_svd['test_rmse'].mean())
print('RMSE cv mean after tuning:', cv_svd_tuned['test_rmse'].mean())

RMSE cv mean before tuning: 1.2057367102537007
RMSE cv mean after tuning: 1.2063815679201373


## **Prediction results**

* Hunter x Hunter (2011), anime_id 11061
* Detective Conan OVA 09, anime_id 2514
* Ranma ½, anime_id 1010
* Saint Seiya: Meiou Hades Juuni Kyuu-hen, anime_id 1257 

In [23]:
users = [50, 200, 400, 800]
anime_ids = [11061, 2514, 1010, 1257]
titles = ['Hunter x Hunter (2011)', 'Detective Conan OVA 09', 'Ranma ½', 'Saint Seiya: Meiou Hades Juuni Kyuu-hen']

df_test = pd.DataFrame(columns = ['user_id', 'anime_id', 'title'])

for i in users:
    for j, k in zip(anime_ids, titles):
        df_test = df_test.append({'user_id':i, 'anime_id':j, 'title':k}, ignore_index=True)
        
df_test

Unnamed: 0,user_id,anime_id,title
0,50,11061,Hunter x Hunter (2011)
1,50,2514,Detective Conan OVA 09
2,50,1010,Ranma ½
3,50,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen
4,200,11061,Hunter x Hunter (2011)
5,200,2514,Detective Conan OVA 09
6,200,1010,Ranma ½
7,200,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen
8,400,11061,Hunter x Hunter (2011)
9,400,2514,Detective Conan OVA 09


In [24]:
svd_predict = SVD(n_epochs=20, lr_all=0.005, reg_all=0.02)
svd_predict.fit(trainset)

y = []

for index, row in df_test.iterrows():
    est = svd_predict.predict(row.user_id, row.anime_id)
    y.append(est[3])
    
df_test['predicted_rating'] = y

df_test.sort_values(by=['user_id', 'predicted_rating'], ascending=[True, False], inplace=True)

In [26]:
df_test[df_test['user_id'] == 50]

Unnamed: 0,user_id,anime_id,title,predicted_rating
0,50,11061,Hunter x Hunter (2011),9.479294
3,50,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,8.119296
2,50,1010,Ranma ½,7.811183
1,50,2514,Detective Conan OVA 09,7.62979


In [27]:
df_test[df_test['user_id'] == 200]

Unnamed: 0,user_id,anime_id,title,predicted_rating
4,200,11061,Hunter x Hunter (2011),10.0
7,200,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,8.927797
5,200,2514,Detective Conan OVA 09,8.727203
6,200,1010,Ranma ½,8.696162


In [28]:
df_test[df_test['user_id'] == 400]

Unnamed: 0,user_id,anime_id,title,predicted_rating
8,400,11061,Hunter x Hunter (2011),8.110673
11,400,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,6.336291
9,400,2514,Detective Conan OVA 09,6.229624
10,400,1010,Ranma ½,6.165477


In [29]:
df_test[df_test['user_id'] == 800]

Unnamed: 0,user_id,anime_id,title,predicted_rating
12,800,11061,Hunter x Hunter (2011),9.530207
15,800,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,8.191965
13,800,2514,Detective Conan OVA 09,7.904181
14,800,1010,Ranma ½,7.845067
