## SlopeOne
这是迄今为止非常简单且有效的一种完全基于统计的协同过滤算法。该算法的主要思想是通过当前用户打分的所有item与当前item i在同一用户下的得分偏差总和来估算当前用户对item i的打分。其预测函数如下：
$$
\hat{r_{ui}}=\mu_u+\frac{\sum_{j\in{R_i(u)}}dev(i,j)}{|R_i(u)|}\\
dev(i,j)=\frac{\sum_{u\in{U_{ij}}}(r_{ui}-r_{uj})}{|U_{ij}|}
$$

In [1]:
from surprise import SlopeOne, accuracy, Dataset
from surprise.model_selection import train_test_split

In [2]:
data = Dataset.load_builtin("ml-100k")
trainset, testset = train_test_split(data, test_size=.2, shuffle=True, random_state=10)

In [3]:
model = SlopeOne()
%time model.fit(trainset)

Wall time: 476 ms


<surprise.prediction_algorithms.slope_one.SlopeOne at 0x1393a23cac8>

In [4]:
#商品i,j平均得分偏差
dev = model.dev
dev.shape

(1653, 1653)

In [5]:
#同时对商品i,j评分的用户数，对应|Uij|
freq = model.freq
freq.shape

(1653, 1653)

In [7]:
#每个用户评分均值
u_mean = model.user_mean
len(u_mean)

943

In [8]:
pred = model.test(testset)
pred[:5]

[Prediction(uid='154', iid='302', r_ui=4.0, est=4.261684397972672, details={'was_impossible': False}),
 Prediction(uid='896', iid='484', r_ui=4.0, est=3.690241759177761, details={'was_impossible': False}),
 Prediction(uid='230', iid='371', r_ui=4.0, est=3.2878762789497538, details={'was_impossible': False}),
 Prediction(uid='234', iid='294', r_ui=3.0, est=2.427146899332413, details={'was_impossible': False}),
 Prediction(uid='25', iid='729', r_ui=4.0, est=3.504304157752663, details={'was_impossible': False})]

In [9]:
accuracy.rmse(pred)

RMSE: 0.9362


0.9362315516670194

In [10]:
trainset.to_inner_iid('302'), trainset.to_inner_uid('154')

(171, 738)

In [12]:
#验证结果和pred第一行是否一致
ur_738 = trainset.ur[738]
est = u_mean[738]
#取出用户u同时评价的其他商品
Ri = [iid for (iid, _) in ur_738 if freq[171, iid] > 0]
if Ri:
    est += sum([dev[171,j] for j in Ri])/len(Ri)
est

4.261684397972672