<a href="https://colab.research.google.com/github/KanadeSisido/Learning-RecommenderSystems-with-X/blob/main/Surprise_Learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Scikit-Surprise
教材：https://www.salesanalytics.co.jp/datascience/datascience180/

surpriseを扱う際にはnumpyが1.X系である必要がある

In [1]:
!pip install scikit-surprise
!pip install "numpy<2"



In [27]:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

import pandas as pd

## データをインポートする

movie-lens-100kを読み込む

In [3]:
data = Dataset.load_builtin('ml-100k')

Dataset ml-100k could not be found. Do you want to download it? [Y/n] Y
Trying to download dataset from https://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


In [29]:
dataFrame = pd.DataFrame(data.raw_ratings, columns=["user", "item", "rate", "id"])
dataFrame.head()

Unnamed: 0,user,item,rate,id
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


In [13]:
trainset, testset = train_test_split(data, test_size=.20)

## 推薦アルゴリズムのインスタンスを作成
今回はSVDを用いる．

In [8]:
algo = SVD()

## 学習を行う

In [10]:
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7d70b02c7a10>

## レコメンドのテスト
テストセットで推論し，モデルの精度を確かめる．

In [11]:
predictions = algo.test(testset)

In [12]:
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.9380
MAE:  0.7391


0.7390597232103736

## 推論テスト+DataFrame

In [None]:
predictions = algo.test(testset)

In [30]:
head = []
count = 0

for userId, itemId, trueRate, estim, _ in predictions:
  head.append([userId, itemId, trueRate, estim])
  count += 1

  if count >= 10:
    break


top_df = pd.DataFrame(head, columns=["UserId", "ItemId", "trueRate", "Estimated"])

print(top_df)

  UserId ItemId  trueRate  Estimated
0    716    501       5.0   3.646246
1    259    200       4.0   4.095009
2    405    379       1.0   1.162487
3    269    825       1.0   1.908681
4    787    352       2.0   2.135792
5    717    825       2.0   2.948920
6    114    210       3.0   3.614627
7    207     25       4.0   3.215446
8    938    118       5.0   3.014688
9    554    282       3.0   3.838258


## 実践
`UserId` == 100 のユーザに推薦する映画の`itemid` を10個選べ．

### 方針
1. movie-lensの中からユーザがまだ評価していない作品のIDを抽出する．
2. 抽出した作品に対してpredictする
3. 評価値が大きい順に10件取得する

In [42]:
# itemのidをユニークに取得する
items = dataFrame['item'].unique()
# userid == 100の行を抽出
user_100 = dataFrame[dataFrame['user'] == '100']
# userid = 100の行の「item」列
rated_user_100 = user_100['item'].unique()

#user100がまだ評価していないitemの配列
unrated_user_100 = [item for item in items if item not in rated_user_100]

preds = []

for uid in unrated_user_100:
  pred = algo.predict('100', uid)
  preds.append([pred.uid, pred.iid, pred.est])

predictions = pd.DataFrame(preds, columns=["userId", "itemId", "estimated"])

top10 = predictions.sort_values('estimated', ascending=False).head(10)

print(top10)

    userId itemId  estimated
301    100    173   4.417724
188    100    318   4.403931
335    100     50   4.288973
169    100    408   4.203804
226    100     64   4.197323
259    100    169   4.185017
95     100    174   4.160746
227    100    357   4.117058
34     100    603   4.110761
163    100    483   4.103984
