## python推荐系统库Surprise

![](./Surprise.png)

在推荐系统的建模过程中，我们将用到python库 [Surprise(Simple Python RecommendatIon System Engine)](https://github.com/NicolasHug/Surprise)，是scikit系列中的一个(很多同学用过scikit-learn和scikit-image等库)。

### 简单易用，同时支持多种推荐算法：
* [基础算法/baseline algorithms](http://surprise.readthedocs.io/en/stable/basic_algorithms.html)
* [基于近邻方法(协同过滤)/neighborhood methods](http://surprise.readthedocs.io/en/stable/knn_inspired.html)
* [矩阵分解方法/matrix factorization-based (SVD, PMF, SVD++, NMF)](http://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD)

| 算法类名        | 说明  |
| ------------- |:-----|
|[random_pred.NormalPredictor](http://surprise.readthedocs.io/en/stable/basic_algorithms.html#surprise.prediction_algorithms.random_pred.NormalPredictor)|Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.|
|[baseline_only.BaselineOnly](http://surprise.readthedocs.io/en/stable/basic_algorithms.html#surprise.prediction_algorithms.baseline_only.BaselineOnly)|Algorithm predicting the baseline estimate for given user and item.|
|[knns.KNNBasic](http://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNBasic)|A basic collaborative filtering algorithm.|
|[knns.KNNWithMeans](http://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNWithMeans)|A basic collaborative filtering algorithm, taking into account the mean ratings of each user.|
|[knns.KNNBaseline](http://surprise.readthedocs.io/en/stable/knn_inspired.html#surprise.prediction_algorithms.knns.KNNBaseline)|A basic collaborative filtering algorithm taking into account a baseline rating.|	
|[matrix_factorization.SVD](http://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD)|The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.|
|[matrix_factorization.SVDpp](http://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVDpp)|The SVD++ algorithm, an extension of SVD taking into account implicit ratings.|
|[matrix_factorization.NMF](http://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.NMF)|A collaborative filtering algorithm based on Non-negative Matrix Factorization.|
|[slope_one.SlopeOne](http://surprise.readthedocs.io/en/stable/slope_one.html#surprise.prediction_algorithms.slope_one.SlopeOne)|A simple yet accurate collaborative filtering algorithm.|
|[co_clustering.CoClustering](http://surprise.readthedocs.io/en/stable/co_clustering.html#surprise.prediction_algorithms.co_clustering.CoClustering)|A collaborative filtering algorithm based on co-clustering.|

### 其中基于近邻的方法(协同过滤)可以设定不同的度量准则。

| 相似度度量标准 | 度量标准说明  |
| ------------- |:-----|
|[cosine](http://surprise.readthedocs.io/en/stable/similarities.html#surprise.similarities.cosine)|Compute the cosine similarity between all pairs of users (or items).|
|[msd](http://surprise.readthedocs.io/en/stable/similarities.html#surprise.similarities.msd)|Compute the Mean Squared Difference similarity between all pairs of users (or items).|
|[pearson](http://surprise.readthedocs.io/en/stable/similarities.html#surprise.similarities.pearson)|Compute the Pearson correlation coefficient between all pairs of users (or items).|
|[pearson_baseline](http://surprise.readthedocs.io/en/stable/similarities.html#surprise.similarities.pearson_baseline)|Compute the (shrunk) Pearson correlation coefficient between all pairs of users (or items) using baselines for centering instead of means.|

### Jaccard similarity
交集元素个数/并集元素个数

### 支持不同的评估准则
| 评估准则 | 准则说明  |
| ------------- |:-----|
|[rmse](http://surprise.readthedocs.io/en/stable/accuracy.html#surprise.accuracy.rmse)|Compute RMSE (Root Mean Squared Error).|
|[msd](http://surprise.readthedocs.io/en/stable/similarities.html#surprise.similarities.msd)|Compute MAE (Mean Absolute Error).|
|[fcp](http://surprise.readthedocs.io/en/stable/accuracy.html#surprise.accuracy.fcp)|Compute FCP (Fraction of Concordant Pairs).|

### 使用示例

#### 基本使用方法如下

```python
# 可以使用上面提到的各种推荐系统算法
from surprise import SVD
from surprise import Dataset
from surprise import evaluate, print_perf

# 默认载入movielens数据集
data = Dataset.load_builtin('ml-100k')
# k折交叉验证(k=3)
data.split(n_folds=3)
# 试一把SVD矩阵分解
algo = SVD()
# 在数据集上测试一下效果
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
#输出结果
print_perf(perf)
```

In [1]:
# 可以使用上面提到的各种推荐系统算法
from surprise import SVD,KNNWithMeans
from surprise import Dataset
from surprise import evaluate, print_perf

# 默认载入movielens数据集
data = Dataset.load_builtin('ml-100k')
# k折交叉验证(k=3)
data.split(n_folds=3)
# 试一把SVD矩阵分解
# algo = SVD()
algo=KNNWithMeans()
# 在数据集上测试一下效果
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
#输出结果
print_perf(perf)



Evaluating RMSE, MAE of algorithm KNNWithMeans.

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9575
MAE:  0.7536
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9555
MAE:  0.7542
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9586
MAE:  0.7543
------------
------------
Mean RMSE: 0.9572
Mean MAE : 0.7540
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.9575  0.9555  0.9586  0.9572  
MAE     0.7536  0.7542  0.7543  0.7540  


#### 载入自己的数据集方法

```python
# 指定文件所在路径
file_path = os.path.expanduser('~/.surprise_data/ml-100k/ml-100k/u.data')
# 告诉文本阅读器，文本的格式是怎么样的
reader = Reader(line_format='user item rating timestamp', sep='\t')
# 加载数据
data = Dataset.load_from_file(file_path, reader=reader)
# 手动切分成5折(方便交叉验证)
data.split(n_folds=5)
```

#### 算法调参(让推荐系统有更好的效果)

这里实现的算法用到的算法无外乎也是SGD等，因此也有一些超参数会影响最后的结果，我们同样可以用sklearn中常用到的网格搜索交叉验证(GridSearchCV)来选择最优的参数。简单的例子如下所示：

```python
# 定义好需要优选的参数网格
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
              'reg_all': [0.4, 0.6]}
# 使用网格搜索交叉验证
grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])
# 在数据集上找到最好的参数
data = Dataset.load_builtin('ml-100k')
data.split(n_folds=3)
grid_search.evaluate(data)
# 输出调优的参数组 
# 输出最好的RMSE结果
print(grid_search.best_score['RMSE'])
# >>> 0.96117566386

# 输出对应最好的RMSE结果的参数
print(grid_search.best_params['RMSE'])
# >>> {'reg_all': 0.4, 'lr_all': 0.005, 'n_epochs': 10}

# 最好的FCP得分
print(grid_search.best_score['FCP'])
# >>> 0.702279736531

# 对应最高FCP得分的参数
print(grid_search.best_params['FCP'])
# >>> {'reg_all': 0.6, 'lr_all': 0.005, 'n_epochs': 10}
```

## 在我们的数据集上训练模型

## 建模和存储模型

### 1.用协同过滤构建模型并进行预测

#### 1.1 movielens的例子

In [2]:
# 可以使用上面提到的各种推荐系统算法
from surprise import KNNWithMeans
from surprise import Dataset
from surprise import evaluate, print_perf

# 默认载入movielens数据集
data = Dataset.load_builtin('ml-100k')
# k折交叉验证(k=3)
data.split(n_folds=3)
# 试一把SVD矩阵分解
algo = KNNWithMeans()
# 在数据集上测试一下效果
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
#输出结果
print_perf(perf)



Evaluating RMSE, MAE of algorithm KNNWithMeans.

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9553
MAE:  0.7514
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9562
MAE:  0.7539
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9564
MAE:  0.7536
------------
------------
Mean RMSE: 0.9560
Mean MAE : 0.7530
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.9553  0.9562  0.9564  0.9560  
MAE     0.7514  0.7539  0.7536  0.7530  


In [3]:
data.raw_ratings[1]

('709', '125', 4.0, '879847730')

In [4]:
"""
以下的程序段告诉大家如何在协同过滤算法建模以后，根据一个item取回相似度最高的item，主要是用到algo.get_neighbors()这个函数
"""

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
import os
import io

from surprise import KNNBaseline
from surprise import Dataset


def read_item_names():
    """
    获取电影名到电影id 和 电影id到电影名的映射
    """

    file_name = (os.path.expanduser('~') +
                 '/.surprise_data/ml-100k/ml-100k/u.item')
    rid_to_name = {}
    name_to_rid = {}
    with io.open(file_name, 'r', encoding='ISO-8859-1') as f:
        for line in f:
            line = line.split('|')
            rid_to_name[line[0]] = line[1]
            name_to_rid[line[1]] = line[0]

    return rid_to_name, name_to_rid


# 首先，用算法计算相互间的相似度
data = Dataset.load_builtin('ml-100k')
trainset = data.build_full_trainset()
sim_options = {'name': 'pearson_baseline', 'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.train(trainset)



Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBaseline at 0x2582fb37c50>

In [5]:
# 获取电影名到电影id 和 电影id到电影名的映射
rid_to_name, name_to_rid = read_item_names()

In [6]:
# 拿出来Toy Story这部电影对应的item id
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_raw_id

'1'

In [7]:
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
toy_story_inner_id

24

In [8]:
# 找到最近的10个邻居
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)
toy_story_neighbors

[433, 101, 302, 309, 971, 95, 26, 561, 816, 347]

In [9]:
# 从近邻的id映射回电影名称
toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
                       for inner_id in toy_story_neighbors)
toy_story_neighbors = (rid_to_name[rid]
                       for rid in toy_story_neighbors)

print()
print('The 10 nearest neighbors of Toy Story are:')
for movie in toy_story_neighbors:
    print(movie)


The 10 nearest neighbors of Toy Story are:
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)


In [10]:
# 拿出来Toy Story这部电影对应的item id
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)

# 找到最近的10个邻居
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)

# 从近邻的id映射回电影名称
toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
                       for inner_id in toy_story_neighbors)
toy_story_neighbors = (rid_to_name[rid]
                       for rid in toy_story_neighbors)

print()
print('The 10 nearest neighbors of Toy Story are:')
for movie in toy_story_neighbors:
    print(movie)


The 10 nearest neighbors of Toy Story are:
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)


#### 1.2 音乐预测的例子

In [11]:
from __future__ import (absolute_import, division, print_function, unicode_literals)
import os
import io

from surprise import KNNBaseline, Reader
from surprise import Dataset

import pickle as pickle
# 重建歌单id到歌单名的映射字典
id_name_dic = pickle.load(open("data/popular_playlist.pkl","rb"))
print("加载歌单id到歌单名的映射字典完成...")
# 重建歌单名到歌单id的映射字典
name_id_dic = {}
for playlist_id in id_name_dic:
    name_id_dic[id_name_dic[playlist_id]] = playlist_id
print("加载歌单名到歌单id的映射字典完成...")


file_path = os.path.expanduser('data/popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 计算歌曲和歌曲之间的相似度
print("构建数据集...")
trainset = music_data.build_full_trainset()
#sim_options = {'name': 'pearson_baseline', 'user_based': False}

加载歌单id到歌单名的映射字典完成...
加载歌单名到歌单id的映射字典完成...
构建数据集...


In [12]:
list(id_name_dic.keys())[2]

'69545352'

In [13]:
print(id_name_dic[list(id_name_dic.keys())[2]])

五月——奔跑吧，青春


In [14]:
trainset.n_items

130573

In [15]:
trainset.n_users

3771

#### 1.2.1 模板之查找最近的user(在这里是歌单)

In [16]:
print("开始训练模型...")
#sim_options = {'user_based': False}
#algo = KNNBaseline(sim_options=sim_options)
algo = KNNBaseline()
algo.train(trainset)

current_playlist = list(name_id_dic.keys())[39]
print("歌单名称", current_playlist)

# 取出近邻
# 映射名字到id
playlist_id = name_id_dic[current_playlist]
print("歌单id", playlist_id)
# 取出来对应的内部user id => to_inner_uid
playlist_inner_id = algo.trainset.to_inner_uid(playlist_id)
print("内部id", playlist_inner_id)

playlist_neighbors = algo.get_neighbors(playlist_inner_id, k=10)

# 把歌曲id转成歌曲名字
# to_raw_uid映射回去
playlist_neighbors = (algo.trainset.to_raw_uid(inner_id)
                       for inner_id in playlist_neighbors)
playlist_neighbors = (id_name_dic[playlist_id]
                       for playlist_id in playlist_neighbors)

print()
print("和歌单 《", current_playlist, "》 最接近的10个歌单为：\n")
for playlist in playlist_neighbors:
    print(playlist, algo.trainset.to_inner_uid(name_id_dic[playlist]))

开始训练模型...
Estimating biases using als...




Computing the msd similarity matrix...
Done computing similarity matrix.
歌单名称 中年——给爸妈广场必备
歌单id 72048599
内部id 39

和歌单 《 中年——给爸妈广场必备 》 最接近的10个歌单为：

奥迪A8车载 DJ 第一季 77
适合男生唱的100首歌 96
程一电台音乐歌单-华语 110
KTV麦霸练习之催泪 111
70后经典歌集 149
歌唱祖国-走向复兴 162
漫无止境的夏天 171
奔跑吧『节目bgm』 172
那些翻唱日语的华语歌 188
华语怀旧|||那些年爸妈听的歌 229


#### 1.2.2 模板之针对用户进行预测

In [17]:
import pickle as pickle
# 重建歌曲id到歌曲名的映射字典
song_id_name_dic = pickle.load(open("data/popular_song.pkl","rb"))
print("加载歌曲id到歌曲名的映射字典完成...")
# 重建歌曲名到歌曲id的映射字典
song_name_id_dic = {}
for song_id in song_id_name_dic:
    song_name_id_dic[song_id_name_dic[song_id]] = song_id
print("加载歌曲名到歌曲id的映射字典完成...")

加载歌曲id到歌曲名的映射字典完成...
加载歌曲名到歌曲id的映射字典完成...


In [18]:
#内部编码的4号用户
user_inner_id = 4
user_rating = trainset.ur[user_inner_id]
items = map(lambda x:x[0], user_rating)
for song in items:
    print(algo.predict(user_inner_id, song, r_ui=1), song_id_name_dic[algo.trainset.to_raw_iid(song)])

user: 4          item: 361        r_ui = 1.00   est = 5.00   {'was_impossible': False} 家	许巍
user: 4          item: 362        r_ui = 1.00   est = 5.00   {'was_impossible': False} 老街	李荣浩
user: 4          item: 363        r_ui = 1.00   est = 5.00   {'was_impossible': False} 滴答	侃侃
user: 4          item: 364        r_ui = 1.00   est = 5.00   {'was_impossible': False} 彩虹	周杰伦
user: 4          item: 365        r_ui = 1.00   est = 5.00   {'was_impossible': False} 米店	张玮玮和郭龙
user: 4          item: 366        r_ui = 1.00   est = 5.00   {'was_impossible': False} 情人	Beyond
user: 4          item: 367        r_ui = 1.00   est = 5.00   {'was_impossible': False} 喜欢你	Beyond
user: 4          item: 220        r_ui = 1.00   est = 5.00   {'was_impossible': False} 灰姑娘	郑钧
user: 4          item: 235        r_ui = 1.00   est = 5.00   {'was_impossible': False} 安和桥	宋冬野
user: 4          item: 240        r_ui = 1.00   est = 5.00   {'was_impossible': False} 去大理	郝云
user: 4          item: 368        r_ui = 1.00   est 

### 2.用矩阵分解进行预测

In [19]:
### 使用NMF
from surprise import NMF, evaluate
from surprise import Dataset

file_path = os.path.expanduser('data/popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 构建数据集和建模
algo = NMF()
trainset = music_data.build_full_trainset()
algo.train(trainset)



<surprise.prediction_algorithms.matrix_factorization.NMF at 0x25841d1b0f0>

In [20]:
user_inner_id = 4
user_rating = trainset.ur[user_inner_id]
items = map(lambda x:x[0], user_rating)
for song in items:
    print(algo.predict(algo.trainset.to_raw_uid(user_inner_id), algo.trainset.to_raw_iid(song), r_ui=1), song_id_name_dic[algo.trainset.to_raw_iid(song)])

user: 69758545   item: 167751     r_ui = 1.00   est = 5.00   {'was_impossible': False} 家	许巍
user: 69758545   item: 133998     r_ui = 1.00   est = 5.00   {'was_impossible': False} 老街	李荣浩
user: 69758545   item: 25638325   r_ui = 1.00   est = 5.00   {'was_impossible': False} 滴答	侃侃
user: 69758545   item: 185809     r_ui = 1.00   est = 5.00   {'was_impossible': False} 彩虹	周杰伦
user: 69758545   item: 26494698   r_ui = 1.00   est = 5.00   {'was_impossible': False} 米店	张玮玮和郭龙
user: 69758545   item: 347355     r_ui = 1.00   est = 5.00   {'was_impossible': False} 情人	Beyond
user: 69758545   item: 346073     r_ui = 1.00   est = 5.00   {'was_impossible': False} 喜欢你	Beyond
user: 69758545   item: 186842     r_ui = 1.00   est = 5.00   {'was_impossible': False} 灰姑娘	郑钧
user: 69758545   item: 27646205   r_ui = 1.00   est = 5.00   {'was_impossible': False} 安和桥	宋冬野
user: 69758545   item: 28977819   r_ui = 1.00   est = 5.00   {'was_impossible': False} 去大理	郝云
user: 69758545   item: 65538      r_ui = 1.00   est 

## 模型存储

In [21]:
import surprise
surprise.dump.dump('data/recommendation.model', algo=algo)
# 可以用下面的方式载入
algo = surprise.dump.load('data/recommendation.model')

## 不同的推荐系统算法评估

### 首先载入数据

In [22]:
import os
from surprise import Reader, Dataset
# 指定文件路径
file_path = os.path.expanduser('data/popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 分成5折
music_data.split(n_folds=5)

In [23]:
music_data

<surprise.dataset.DatasetAutoFolds at 0x2585aafa470>

In [24]:
music_data.raw_ratings[:20]

[('31708517', '32019128', 3.0, '1300000'),
 ('87872451', '36190599', 7.0, '1300000'),
 ('108717850', '4878308', 11.0, '1300000'),
 ('325606188', '30953301', 11.0, '1300000'),
 ('450038411', '31062979', 12.0, '1300000'),
 ('384483907', '368760', 11.0, '1300000'),
 ('10032711', '5280086', 9.0, '1300000'),
 ('331104108', '184544', 11.0, '1300000'),
 ('38127319', '186443', 9.0, '1300000'),
 ('157290336', '416385492', 10.0, '1300000'),
 ('365123743', '34144434', 12.0, '1300000'),
 ('366850364', '82525', 9.0, '1300000'),
 ('134325190', '28830157', 12.0, '1300000'),
 ('50271609', '210842', 11.0, '1300000'),
 ('155013989', '28853096', 12.0, '1300000'),
 ('596346864', '419077073', 11.0, '1300000'),
 ('56664203', '29724292', 11.0, '1300000'),
 ('135682470', '26241459', 7.0, '1300000'),
 ('48742458', '106368', 12.0, '1300000'),
 ('68186334', '334306', 3.0, '1300000')]

In [25]:
### 使用NormalPredictor
from surprise import NormalPredictor, evaluate
algo = NormalPredictor()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm NormalPredictor.

------------
Fold 1
RMSE: 5.4390
MAE:  4.9682
------------
Fold 2
RMSE: 5.4478
MAE:  4.9763
------------
Fold 3
RMSE: 5.4506
MAE:  4.9813
------------
Fold 4
RMSE: 5.4381
MAE:  4.9687
------------
Fold 5
RMSE: 5.4566
MAE:  4.9881
------------
------------
Mean RMSE: 5.4464
Mean MAE : 4.9765
------------
------------




In [26]:
### 使用BaselineOnly
from surprise import BaselineOnly, evaluate
algo = BaselineOnly()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm BaselineOnly.

------------
Fold 1
Estimating biases using als...
RMSE: 5.3604
MAE:  4.8951
------------
Fold 2
Estimating biases using als...
RMSE: 5.3671
MAE:  4.9019
------------
Fold 3
Estimating biases using als...
RMSE: 5.3700
MAE:  4.9066
------------
Fold 4
Estimating biases using als...
RMSE: 5.3605
MAE:  4.8972
------------
Fold 5
Estimating biases using als...
RMSE: 5.3730
MAE:  4.9100
------------
------------
Mean RMSE: 5.3662
Mean MAE : 4.9022
------------
------------


In [27]:
### 使用基础版协同过滤
from surprise import KNNBasic, evaluate
algo = KNNBasic()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm KNNBasic.

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3596
MAE:  4.8894
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3663
MAE:  4.8961
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3692
MAE:  4.9011
------------
Fold 4
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3593
MAE:  4.8904
------------
Fold 5
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3723
MAE:  4.9051
------------
------------
Mean RMSE: 5.3653
Mean MAE : 4.8964
------------
------------




In [28]:
### 使用均值协同过滤
from surprise import KNNWithMeans, evaluate
algo = KNNWithMeans()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm KNNWithMeans.

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3685
MAE:  4.9148
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3756
MAE:  4.9218
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3782
MAE:  4.9268
------------
Fold 4
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3687
MAE:  4.9164
------------
Fold 5
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3810
MAE:  4.9297
------------
------------
Mean RMSE: 5.3744
Mean MAE : 4.9219
------------
------------


In [29]:
### 使用协同过滤baseline
from surprise import KNNBaseline, evaluate
algo = KNNBaseline()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm KNNBaseline.

------------
Fold 1
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3614
MAE:  4.8965
------------
Fold 2
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3683
MAE:  4.9037
------------
Fold 3
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3710
MAE:  4.9081
------------
Fold 4
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3616
MAE:  4.8986
------------
Fold 5
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 5.3737
MAE:  4.9111
------------
------------
Mean RMSE: 5.3672
Mean MAE : 4.9036
------------
------------


In [30]:
### 使用SVD
from surprise import SVD, evaluate
algo = SVD()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 5.3598
MAE:  4.8929
------------
Fold 2
RMSE: 5.3666
MAE:  4.9000
------------
Fold 3
RMSE: 5.3693
MAE:  4.9043
------------
Fold 4
RMSE: 5.3599
MAE:  4.8947
------------
Fold 5
RMSE: 5.3722
MAE:  4.9070
------------
------------
Mean RMSE: 5.3655
Mean MAE : 4.8998
------------
------------


In [31]:
# ### 使用SVD++
# from surprise import SVDpp, evaluate
# algo = SVDpp()
# perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

In [32]:
### 使用NMF
from surprise import NMF
algo = NMF()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])
print_perf(perf)



Evaluating RMSE, MAE of algorithm NMF.

------------
Fold 1
RMSE: 5.3873
MAE:  4.9696
------------
Fold 2
RMSE: 5.3955
MAE:  4.9799
------------
Fold 3
RMSE: 5.3990
MAE:  4.9872
------------
Fold 4
RMSE: 5.3883
MAE:  4.9728
------------
Fold 5
RMSE: 5.4013
MAE:  4.9875
------------
------------
Mean RMSE: 5.3943
Mean MAE : 4.9794
------------
------------
        Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    
RMSE    5.3873  5.3955  5.3990  5.3883  5.4013  5.3943  
MAE     4.9696  4.9799  4.9872  4.9728  4.9875  4.9794  


