## **Latihan Collaborative Filtering**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Dengan menggunakan dataset anime & rating, buatlah recommendation system dengan skema berikut:**

* Gabungkan kedua data agar dapat memunculkan informasi-informasi yang ada pada dataset anime.
* Bandingkan algoritma SVD dan ALS
* Tuning algoritma yang menurut kalian lebih baik

Setelah mendapatkan model terbaik, coba prediksi rating anime berikut:

* Hunter x Hunter (2011), anime_id 11061
* Detective Conan OVA 09, anime_id 2514
* Ranma ½, anime_id 1010
* Saint Seiya: Meiou Hades Juuni Kyuu-hen, anime_id 1257

Oleh user:

* 50
* 200
* 400
* 800

Bagaimana urutan rekomendasi yang akan kalian berikan untuk masing-masing user?

## **Import libraries**

* http://surpriselib.com/
* https://surprise.readthedocs.io/en/stable/

In [None]:
!pip install surprise

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.1.tar.gz (11.8 MB)
[K     |████████████████████████████████| 11.8 MB 5.3 MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1633967 sha256=6d381382c00c4e73ccf3018b37c44bb3bfba3a234fcc3e94fcb21d06546e884d
  Stored in directory: /root/.cache/pip/wheels/76/44/74/b498c42be47b2406bd27994e16c5188e337c657025ab400c1c
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.1 surprise-0.1


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns

# Dataset formatting
from surprise import Reader
from surprise import Dataset

from surprise import SVD            # SVD
from surprise import BaselineOnly   # ALS

from surprise import accuracy
from surprise.model_selection import cross_validate, train_test_split
from surprise.model_selection import GridSearchCV

## **Load dataset & preprocessing**

In [None]:
df_rating = pd.read_csv('/content/drive/MyDrive/JCDSVL-04, 06, 07, JCDSAHLS-01 Practice Session/Modul 3/Week 10/Wednesday, October 26, 2022/rating.csv')
df_rating

Unnamed: 0.1,Unnamed: 0,user_id,anime_id,rating
0,47,1,8074,10.0
1,81,1,11617,10.0
2,83,1,11757,10.0
3,101,1,15451,10.0
4,153,2,11771,10.0
...,...,...,...,...
77863,96433,999,11757,6.0
77864,96434,999,16498,9.0
77865,96435,999,21881,5.0
77866,96436,999,22319,8.0


In [None]:
# Drop kolom yang tidak berguna
df_rating = df_rating.drop(columns='Unnamed: 0', axis=1)
df_rating.head(10)

Unnamed: 0,user_id,anime_id,rating
0,1,8074,10.0
1,1,11617,10.0
2,1,11757,10.0
3,1,15451,10.0
4,2,11771,10.0
5,3,20,8.0
6,3,154,6.0
7,3,170,9.0
8,3,199,10.0
9,3,225,9.0


In [None]:
df_anime = pd.read_csv('/content/drive/MyDrive/JCDSVL-04, 06, 07, JCDSAHLS-01 Practice Session/Modul 3/Week 10/Tuesday, October 25, 2022/anime.csv')
df_anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [None]:
# Menggabungkan df_rating dan df_anime --> Left join pada kolom anime_id
df_merged = pd.merge(df_rating, df_anime, how='left', on=['anime_id'])
df_merged

Unnamed: 0,user_id,anime_id,rating_x,name,genre,type,episodes,rating_y,members
0,1,8074,10.0,Highschool of the Dead,"Action, Ecchi, Horror, Supernatural",TV,12,7.46,535892
1,1,11617,10.0,High School DxD,"Comedy, Demons, Ecchi, Harem, Romance, School",TV,12,7.70,398660
2,1,11757,10.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",TV,25,7.83,893100
3,1,15451,10.0,High School DxD New,"Action, Comedy, Demons, Ecchi, Harem, Romance,...",TV,12,7.87,266657
4,2,11771,10.0,Kuroko no Basket,"Comedy, School, Shounen, Sports",TV,25,8.46,338315
...,...,...,...,...,...,...,...,...,...
77863,999,11757,6.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",TV,25,7.83,893100
77864,999,16498,9.0,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",TV,25,8.54,896229
77865,999,21881,5.0,Sword Art Online II,"Action, Adventure, Fantasy, Game, Romance",TV,24,7.35,537892
77866,999,22319,8.0,Tokyo Ghoul,"Action, Drama, Horror, Mystery, Psychological,...",TV,12,8.07,618056


In [None]:
# Drop kolom yang tidak digunakan
df_merged = df_merged.drop(columns=['type', 'episodes', 'rating_y', 'members'], axis=1)

# Ganti nama kolom 'rating_x' menjadi 'user_rating'
df_merged = df_merged.rename(columns={'rating_x':'user_rating'})
df_merged

Unnamed: 0,user_id,anime_id,user_rating,name,genre
0,1,8074,10.0,Highschool of the Dead,"Action, Ecchi, Horror, Supernatural"
1,1,11617,10.0,High School DxD,"Comedy, Demons, Ecchi, Harem, Romance, School"
2,1,11757,10.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance"
3,1,15451,10.0,High School DxD New,"Action, Comedy, Demons, Ecchi, Harem, Romance,..."
4,2,11771,10.0,Kuroko no Basket,"Comedy, School, Shounen, Sports"
...,...,...,...,...,...
77863,999,11757,6.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance"
77864,999,16498,9.0,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power"
77865,999,21881,5.0,Sword Art Online II,"Action, Adventure, Fantasy, Game, Romance"
77866,999,22319,8.0,Tokyo Ghoul,"Action, Drama, Horror, Mystery, Psychological,..."


In [None]:
df_merged.describe()
# rating dari 1-10

Unnamed: 0,user_id,anime_id,user_rating
count,77868.0,77868.0,77868.0
mean,517.812786,10721.879116,7.855268
std,278.020509,9033.079184,1.53807
min,1.0,1.0,1.0
25%,288.0,2273.0,7.0
50%,529.0,9513.0,8.0
75%,753.0,16592.0,9.0
max,999.0,34240.0,10.0


In [None]:
# Pivot table menjadi sparse matrix
user_item_rating_matrix = df_merged.pivot_table(values='user_rating', index ='user_id', columns ='anime_id')
user_item_rating_matrix

anime_id,1,5,6,7,8,15,16,17,18,19,...,33338,33341,33372,33421,33524,33558,33569,33964,34103,34240
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
5,,,8.0,,,6.0,,6.0,6.0,,...,,,,,,,,,,
7,,,,,,,,,,,...,,7.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,,,,,,,,,,9.0,...,,,,,,,,,,
996,,,,,,,,,,,...,,,,,,,,,,
997,9.0,,,,,,,,,,...,,,,,,,,,,
998,,,,,,,,,,,...,,,,,,,,,,


User-Item matrix with rating terdiri dari 940 user dan 4510 anime

## **Modeling**

In [None]:
reader = Reader(rating_scale=(1, 10))

data = Dataset.load_from_df(df_merged[['user_id', 'anime_id', 'user_rating']], reader)
data

<surprise.dataset.DatasetAutoFolds at 0x7f952f22d5d0>

## **Validation**

In [None]:
trainset, testset = train_test_split(data, test_size=0.2, random_state=1)

### **SVD**

In [None]:
algo_svd = SVD()

algo_svd.fit(trainset)
prediction_svd = algo_svd.test(testset)

In [None]:
accuracy.rmse(prediction_svd)

RMSE: 1.2055


1.2054769057226464

### **ALS**

In [None]:
bsl_options = {'method': 'als',
               'n_epochs': 10,
               'reg_u': 15,
               'reg_i': 10
               }

algo_als = BaselineOnly(bsl_options=bsl_options)

algo_als.fit(trainset)
prediction_als = algo_als.test(testset)

Estimating biases using als...


In [None]:
accuracy.rmse(prediction_als)

RMSE: 1.2128


1.2127696615627046

## **Cross Validation**

### **SVD**

In [None]:
cv_svd = cross_validate(algo_svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.1967  1.2074  1.1909  1.2128  1.2214  1.2058  0.0110  
MAE (testset)     0.9097  0.9130  0.9028  0.9206  0.9306  0.9153  0.0095  
Fit time          3.92    6.77    3.98    3.89    3.92    4.50    1.14    
Test time         0.12    0.40    0.11    0.11    0.22    0.19    0.11    


In [None]:
print('RMSE cv mean', cv_svd['test_rmse'].mean())

RMSE cv mean 1.2058402680844675


### **ALS**

In [None]:
cv_als = cross_validate(algo_als, data, measures=['RMSE','MAE'], cv=5, verbose=True)

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Evaluating RMSE, MAE of algorithm BaselineOnly on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2120  1.2163  1.2082  1.2110  1.2155  1.2126  0.0030  
MAE (testset)     0.9299  0.9247  0.9178  0.9238  0.9254  0.9243  0.0039  
Fit time          0.30    0.33    0.34    0.36    0.35    0.34    0.02    
Test time         0.07    0.09    0.20    0.09    0.08    0.11    0.05    


In [None]:
print('RMSE cv mean', cv_als['test_rmse'].mean())

RMSE cv mean 1.212596076999343


## **Hyperparameter tuning**

In [None]:
# Tuning SVD
hyperparam_space = {
    'n_epochs':[5, 10, 20, 30],     # jumlah iterasi
    'lr_all':[0.002, 0.005],        # learning rate
    'reg_all':[0.02, 0.4, 0.6]      # regularization
}

grid_search = GridSearchCV(SVD, hyperparam_space, measures=['rmse', 'mae'], cv=5)

grid_search.fit(data)

In [None]:
print('RMSE')
print(grid_search.best_score['rmse'])
print(grid_search.best_params['rmse'])

print('\nMAE')
print(grid_search.best_score['mae'])
print(grid_search.best_params['mae'])

RMSE
1.205612856820332
{'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.02}

MAE
0.9147838755242823
{'n_epochs': 20, 'lr_all': 0.005, 'reg_all': 0.02}


In [None]:
# Contoh tuning metode ALS
# param_grid = {'bsl_options': {'method': ['als'],
#                               'n_epochs': [5,10,15],
#                               'reg_u': [12, 18, 27],
#                               'reg_i': [5,50,100]}
#               }

# gs = GridSearchCV(BaselineOnly, param_grid, measures=['rmse', 'mae'], cv=3)

# gs.fit(data)

## **Model with Hyperparameter Tuning**

In [None]:
svd_tuned = SVD(n_epochs = 20, lr_all = 0.005, reg_all = 0.02)
cv_svd_tuned = cross_validate(svd_tuned, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.2099  1.2029  1.2029  1.2036  1.2010  1.2040  0.0030  
MAE (testset)     0.9132  0.9165  0.9153  0.9162  0.9094  0.9141  0.0026  
Fit time          3.98    3.92    3.93    3.93    3.95    3.94    0.02    
Test time         0.13    0.25    0.11    0.12    0.25    0.17    0.06    


In [None]:
# Perbandingan RMSE sebelum dan sesudah tuning
print('RMSE cv mean before tuning:', cv_svd['test_rmse'].mean())
print('RMSE cv mean after tuning:', cv_svd_tuned['test_rmse'].mean())

RMSE cv mean before tuning: 1.2058402680844675
RMSE cv mean after tuning: 1.2040470840776052


## **Prediction results**

* Hunter x Hunter (2011), anime_id 11061
* Detective Conan OVA 09, anime_id 2514
* Ranma ½, anime_id 1010
* Saint Seiya: Meiou Hades Juuni Kyuu-hen, anime_id 1257

In [None]:
users = [50, 200, 400, 800]
anime_ids = [11061, 2514, 1010, 1257]
titles = ['Hunter x Hunter (2011)', 'Detective Conan OVA 09', 'Ranma ½', 'Saint Seiya: Meiou Hades Juuni Kyuu-hen']

# Dataframe kosong
df_test = pd.DataFrame(columns=['user_id', 'anime_id', 'title'], dtype='object')
df_test

# Mengisi dataframe dengan user_id dan anime_id beserta titlenya
for i in users:
    for j, k in zip(anime_ids, titles):
        df_test = df_test.append({'user_id':i, 'anime_id':j, 'title':k}, ignore_index=True)

df_test

Unnamed: 0,user_id,anime_id,title
0,50,11061,Hunter x Hunter (2011)
1,50,2514,Detective Conan OVA 09
2,50,1010,Ranma ½
3,50,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen
4,200,11061,Hunter x Hunter (2011)
5,200,2514,Detective Conan OVA 09
6,200,1010,Ranma ½
7,200,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen
8,400,11061,Hunter x Hunter (2011)
9,400,2514,Detective Conan OVA 09


In [None]:
# define model
svd_predict = SVD(n_epochs=20, lr_all=0.005, reg_all=0.02)

# fitting
svd_predict.fit(trainset)

# untuk menyimpan predicted score
y = []

# Melakukan prediksi pada tiap baris
for index, row in df_test.iterrows():
    est = svd_predict.predict(row['user_id'], row['anime_id'])
    y.append(est[3])

df_test['predicted_rating'] = y

df_test.sort_values(by=['user_id', 'predicted_rating'], ascending=[True, False], inplace=True)
df_test

Unnamed: 0,user_id,anime_id,title,predicted_rating
0,50,11061,Hunter x Hunter (2011),9.722685
3,50,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,8.212674
2,50,1010,Ranma ½,8.023918
1,50,2514,Detective Conan OVA 09,7.663447
4,200,11061,Hunter x Hunter (2011),10.0
7,200,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,9.069603
6,200,1010,Ranma ½,8.751784
5,200,2514,Detective Conan OVA 09,8.678579
8,400,11061,Hunter x Hunter (2011),8.689244
11,400,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,6.354356


In [None]:
est

Prediction(uid=800, iid=1257, r_ui=None, est=7.938865960263387, details={'was_impossible': False})

In [None]:
df_test[df_test['user_id'] == 50]

Unnamed: 0,user_id,anime_id,title,predicted_rating
0,50,11061,Hunter x Hunter (2011),9.722685
3,50,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,8.212674
2,50,1010,Ranma ½,8.023918
1,50,2514,Detective Conan OVA 09,7.663447


In [None]:
df_test[df_test['user_id'] == 200]

Unnamed: 0,user_id,anime_id,title,predicted_rating
4,200,11061,Hunter x Hunter (2011),10.0
7,200,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,9.069603
6,200,1010,Ranma ½,8.751784
5,200,2514,Detective Conan OVA 09,8.678579


In [None]:
df_test[df_test['user_id'] == 400]

Unnamed: 0,user_id,anime_id,title,predicted_rating
8,400,11061,Hunter x Hunter (2011),8.689244
11,400,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,6.354356
10,400,1010,Ranma ½,6.349191
9,400,2514,Detective Conan OVA 09,6.13516


In [None]:
df_test[df_test['user_id'] == 800]

Unnamed: 0,user_id,anime_id,title,predicted_rating
12,800,11061,Hunter x Hunter (2011),9.891175
14,800,1010,Ranma ½,8.259108
13,800,2514,Detective Conan OVA 09,7.968935
15,800,1257,Saint Seiya: Meiou Hades Juuni Kyuu-hen,7.938866


## **Coba lihat rekomendasi anime untuk seorang user**

In [None]:
df_merged[df_merged['user_id']==1]

Unnamed: 0,user_id,anime_id,user_rating,name,genre
0,1,8074,10.0,Highschool of the Dead,"Action, Ecchi, Horror, Supernatural"
1,1,11617,10.0,High School DxD,"Comedy, Demons, Ecchi, Harem, Romance, School"
2,1,11757,10.0,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance"
3,1,15451,10.0,High School DxD New,"Action, Comedy, Demons, Ecchi, Harem, Romance,..."


In [None]:
df_merged['anime_id'].nunique()

4510

In [None]:
# cek score untuk masing-masing anime berdasarkan user
user_id = 1

# anime_id dan name yg tidak ada duplikat (unique)
anime = list(df_merged['anime_id'].unique())
name = list(df_merged['name'].unique())

In [None]:
svd_predict = SVD(n_epochs=20, lr_all=0.005, reg_all=0.02)
svd_predict.fit(trainset)

# prediksi score untuk seluruh anime berdasarkan user1
anime_score = [svd_predict.predict(user_id, anime_id).est for anime_id in anime]
anime_score

[9.506169485809929,
 9.301126094068161,
 9.599297492835074,
 9.46106101800944,
 9.288912703856187,
 8.642496695530218,
 8.150787411167004,
 9.442175072492201,
 9.466280766302313,
 7.381592411616245,
 8.578186931630489,
 8.106137449746297,
 8.490287689791765,
 8.798881770598697,
 9.405417322615559,
 8.155136772716256,
 7.698280403525725,
 8.137479161857096,
 7.474213128446633,
 7.690318306461544,
 8.129843510748447,
 8.254108149796393,
 9.357028873712803,
 7.719323786486716,
 8.901069480853742,
 8.307000399878087,
 8.863852102038342,
 7.661322018814286,
 7.2223520314405185,
 8.433873067382592,
 9.22950676277453,
 8.201448887777703,
 9.526755729860891,
 8.22838555630844,
 8.048313760832594,
 9.24854875760787,
 8.349081277264416,
 8.08542893551461,
 8.041589277006704,
 8.670188168212206,
 8.566412574228965,
 7.847369746943322,
 8.845264897982382,
 9.542626169717805,
 8.88691862892219,
 8.393624472337235,
 8.745209851251346,
 7.648368717684677,
 8.47583595093268,
 8.447801900723181,
 7.768

In [None]:
# Rekomendasi untuk seorang user
recomToUser = pd.DataFrame({
                            'anime_id': anime,
                            'title':name,
                            'score': anime_score
                            }).sort_values(by='score', ascending=False)

recomToUser.head(20)

Unnamed: 0,anime_id,title,score
297,9969,Gintama&#039;,9.945739
895,6114,Rainbow: Nisha Rokubou no Shichinin,9.94171
376,15335,Gintama Movie: Kanketsu-hen - Yorozuya yo Eien...,9.933015
586,11061,Hunter x Hunter (2011),9.92548
289,9253,Steins;Gate,9.861717
90,28891,Haikyuu!! Second Season,9.835643
549,2904,Code Geass: Hangyaku no Lelouch R2,9.826443
126,245,Great Teacher Onizuka,9.818246
592,11981,Mahou Shoujo Madoka★Magica Movie 3: Hangyaku n...,9.790218
898,6675,Redline,9.751407
