### 前提知识

##### 相似度公式

Jaccard公式

$w_{ij} = \frac{|N(i) \cap N(j)|}{\sqrt{|N(i)||N(j)|}}$

余弦相似度

$w_{ij} = \frac{N(i) · N(j)}{\sqrt{|N(i)||N(j)|}}$

##### 模型评估

召回率 推荐物品占用户实际消费物品的比例

$Recall = \frac{\sum_{u}|R(u)\bigcap T(u)|}{\sum_{u}|T(u)|}$

精确率 推荐物品占模型预测物品的比例

$Precision = \frac{\sum_{u}|R(u)\bigcap T(u)|}{\sum_{u}|R(u)|}$

覆盖率 推荐物品种类占所有物品种类的比例

$Coverage = \frac{|U_{u\epsilon U}\  R(u)|}{|I|}$

新颖度 推荐物品的平均流行度可以度量推荐结果的新颖度，若推荐物品都很热门，说明推荐物品的流行度高而新颖度低。

### 传统协同过滤模型

##### 随机推荐

给用户随机推荐没有消费过的N个物品。（没有准确性，这里仅作评估指标参考。）

##### 热度推荐

给用户推荐没有消费过的N个热门物品，通常作为召回算法的补足。

##### ItemCF

基于物品的协同过滤算法，给用户推荐他之前喜欢物品的相似物品。

1. 先根据用户行为数据建立User-Item共现矩阵。
2. 然后根据共现矩阵求出物品相似度矩阵。

- 喜欢物品i的用户有多少喜欢物品j
$w_{ij} = \frac{|N(i) \cap N(j)|}{|N(i)|}$

- 若j是大家都喜欢的热门物品，则上述公式为1，故对`热门物品j进行惩罚`。
$w_{ij} = \frac{|N(i) \cap N(j)|}{\sqrt{|N(i)||N(j)|}}$

##### ItemCF-IUF

- 活跃度对物品相似度的贡献应该小于不活跃用户，故增加IUF，对`活跃用户进行惩罚`。
$w_{ij} = \frac{\sum_{u∈N(i) \cap N(j)} \quad \frac{1}{log1+|N(u)|}}{\sqrt{|N(i)||N(j)|}}$

##### ItemCF-Weight

- 考虑点击的位置权重、点击时间权重、创建item时间权重（点击的位置越近，时间越近，分子越大，w越大）。

$w_{ij} = \frac{\sum_{u∈N(i) \cap N(j)} \quad \frac{W_{loc} \;\;*\, W_{time} \;\; *\, W_{ctime}}{log1+|N(u)|}}{\sqrt{|N(i)||N(j)|}}$

##### UserCF

基于用户的协同过滤算法，给用户推荐他兴趣相似用户喜欢的物品。

1. 先根据用户行为数据建立Item-User共现矩阵。
2. 然后根据共现矩阵求出用户相似度矩阵。

- 用户i喜欢的物品有多少被用户j喜欢。
$w_{ij} = \frac{|N(i) \cap N(j)|}{|N(i)|}$

- 若j是活跃用户，则上述公式为1，故对`活跃用户j进行惩罚`。
$w_{ij} = \frac{|N(i) \cap N(j)|}{\sqrt{|N(i)||N(j)|}}$

##### User-IIF

以图书为例，两个用户同时购买《新华字典》，丝毫不能说明他们兴趣相似，但同时购买《数据挖掘导论》，则说明他们兴趣比较相似。两个用户对冷门物品采用同样的行为更能说明他们兴趣相似度。

$w_{uv} = \frac{\sum_{i\in{N(u)\bigcap N(v)}}}{\sqrt{|N(u)||N(v)|}} \frac{1}{log1+|N(i)|}$

$\frac{1}{log1+N(i)}$惩罚了用户u和用户v共同兴趣列表中热门物品对他们相似度的影响。

##### 好友推荐算法（引申）

- 基于共同好友简单加权 $w_{ij} = |N(i) \cap N(j)|$
- 对活跃用户进行惩罚 $w_{ij} = \frac{|N(i) \cap N(j)|}{|N(i) \cup N(j)|}$
- 对共同好友的活跃用户进行惩罚 $w_{ij} = \sum|N(i) \cap N(j)|\frac{1}{N(k)}$

In [1]:
import os
import time
import math
import random
import numpy as np
import pandas as pd
from collections import defaultdict  
from tqdm import tqdm_notebook as tqdm

from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings("ignore")

def timmer(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        res = func(*args, **kwargs)
        stop_time = time.time()
        print('Func %s, run time: %s' % (func.__name__, stop_time - start_time))
        return res

    return wrapper

In [2]:
class Dataset():
    def __init__(self, filepath):
        """ 读取数据，构建数据集 """
        self.data = self.load_data(filepath)
    
    def load_data(self, filepath):
        data = pd.read_csv(file_path, names=['user_id', 'item_id', 'rating', 'click_timestamp'], sep='::')
        return data
    
    def split_data(self, M, k, seed=1024):
        """ 数据集划分
        :param M 划分的折数
        :param k 第几次划分k∈[0, M)
        :return: train test
        """        
        self.data = self.data.sample(frac=1, random_state=seed)
        self.data.reset_index(inplace=True, drop=True)
        train = self.data[self.data.index % M != k]
        test = self.data[self.data.index % M == k]
        
        def convert_dict(data):
            """ 转化为字典形式
            :params data [(user1, item1), (user2, item2)]
            :return: {user1: [(item1, time1), (item2, time2)..]}
            """
            
            data = data.sort_values(['user_id', 'click_timestamp'])

            def make_item_time_pair(df):
                return list(zip(df['item_id'], df['click_timestamp']))

            user_item_time_df = data.groupby('user_id')['item_id', 'click_timestamp'].apply(lambda x: make_item_time_pair(x))\
                                                                    .reset_index().rename(columns={0: 'item_time_list'})

            user_item_time_dict = dict(zip(user_item_time_df['user_id'], user_item_time_df['item_time_list']))
            return user_item_time_dict

        return convert_dict(train), convert_dict(test)

##### 随机推荐

In [3]:
@timmer
def random_rec(train, K, N):
    """ 随机推荐
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    num_dict = defaultdict(int) # item计数器
    for user, item_time_list in tqdm(train.items()):
        # 位置 itemID 点击时间
        for loc, (i, i_click_time) in enumerate(item_time_list):
            num_dict[i] += 1
            
    def get_sample_items(num_dict, seen_items, num=10):
        """ 随机推荐N个物品 """
        sampled_item_dict = {}
        while num:
            key = random.choice(list(num_dict))
            if key not in sampled_item_dict and key not in seen_items:
                num -= 1
                sampled_item_dict[key] = num_dict[key]
        return sampled_item_dict.items()
            
    def get_recommendation(user):
        """ 推荐用户未消费的前N个物品 """
        seen_items = set([i[0] for i in train[user]]) # 用户点击过得物品
        sample_items = get_sample_items(num_dict, seen_items, N)
        return sample_items
    
    return get_recommendation

##### 热度推荐

In [4]:
@timmer
def hot_rec(train, K, N):
    """ 热度推荐
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    num_dict = defaultdict(int) # item计数器
    for user, item_time_list in tqdm(train.items()):
        # 位置 itemID 点击时间
        for loc, (i, i_click_time) in enumerate(item_time_list):
            num_dict[i] += 1
            
    def get_recommendation(user):
        """ 推荐用户未消费的前N个物品 """
        seen_items = set([i[0] for i in train[user]]) # 用户点击过得物品
        hot_items = {k: num_dict[k] for k in num_dict.keys() if k not in seen_items}
        hot_items = [item for item in sorted(hot_items.items(), key=lambda x: x[1], reverse=True)]
        return hot_items[:N]
    
    return get_recommendation

##### ItemCF

In [5]:
@timmer
def itemCF(train, K, N):
    """ ItemCF
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    # 计算物品相似度矩阵
    sim_dict = {}
    num_dict = defaultdict(int) # item计数器
    for user, item_time_list in tqdm(train.items()):
        # 位置、itemID、点击时间
        for loc1, (i, i_click_time) in enumerate(item_time_list):
            num_dict[i] += 1
            sim_dict.setdefault(i, {})
            for loc2, (j, j_click_time) in enumerate(item_time_list):
                sim_dict[i].setdefault(j, 0)
                sim_dict[i][j] += 1
    for i in sim_dict:
        for j in sim_dict[i]:
            sim_dict[i][j] /= math.sqrt(num_dict[i] * num_dict[j]) # 对物品热度进行惩罚
    
    # 按照物品相似度排序
    sorted_sim_dict = {k: list(sorted(v.items(), key=lambda x: x[1], reverse=True)) for k, v in sim_dict.items()}
    
    def get_recommendation(user):
        """ 获取当前用户的推荐结果 """
        if user not in train:
            print('user not exist: ', user)
            return []
        rec_items = {}
        seen_items = set([i[0] for i in train[user]]) # 用户点击过得物品
        for (item, _) in train[user]:
            for sim_item, _ in sorted_sim_dict[item][:K]:
                # 去掉用户见过的
                if sim_item in seen_items:
                    continue
                if sim_item not in rec_items:
                    rec_items[sim_item] = 0
                rec_items[sim_item] += sim_dict[item][sim_item]
        return list(sorted(rec_items.items(), key=lambda x: x[1], reverse=True))[:N]
    return get_recommendation

@timmer
def itemIUF(train, K, N):
    """ ItemCF with Inverse User Frequence
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    # 计算物品相似度矩阵
    sim_dict = {}
    num_dict = defaultdict(int) # item计数器
    for user, item_time_list in tqdm(train.items()):
        # 位置、itemID、点击时间
        for loc1, (i, i_click_time) in enumerate(item_time_list):
            num_dict[i] += 1
            sim_dict.setdefault(i, {})
            for loc2, (j, j_click_time) in enumerate(item_time_list):
                if i==j:
                    continue
                sim_dict[i].setdefault(j, 0)
                sim_dict[i][j] += 1 / math.log(1+len(item_time_list)) # 对用户热度进行惩罚
    for i in sim_dict:
        for j in sim_dict[i]:
            sim_dict[i][j] /= math.sqrt(num_dict[i] * num_dict[j]) # 对物品热度进行惩罚
            
    # 按照物品相似度排序
    sorted_sim_dict = {k: list(sorted(v.items(), key=lambda x: x[1], reverse=True)) for k, v in sim_dict.items()}
    
    def get_recommendation(user):
        """ 获取当前用户的推荐结果 """
        if user not in train:
            print('user not exist: ', user)
            return []
        rec_items = {}
        seen_items = set([i[0] for i in train[user]]) # 用户点击过得物品
        for (item, _) in train[user]:
            for sim_item, _ in sorted_sim_dict[item][:K]:
                # 去掉用户见过的
                if sim_item in seen_items:
                    continue
                if sim_item not in rec_items:
                    rec_items[sim_item] = 0
                rec_items[sim_item] += sim_dict[item][sim_item]
        return list(sorted(rec_items.items(), key=lambda x: x[1], reverse=True))[:N]
    return get_recommendation

@timmer
def itemWeight(train, K, N):
    """ ItemCF with Weight
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    # 计算物品相似度矩阵
    sim_dict = {}
    num_dict = defaultdict(int) # item计数器
    for user, item_time_list in tqdm(train.items()):
        # 位置、itemID、点击时间
        for loc1, (i, i_click_time) in enumerate(item_time_list):
            num_dict[i] += 1
            sim_dict.setdefault(i, {})
            for loc2, (j, j_click_time) in enumerate(item_time_list):
                if i==j:
                    continue
                # 考虑正/反向顺序
                loc_alpha = 1.0 if loc2 > loc1 else 0.7
                # 考虑位置权重
                loc_weight = loc_alpha * (0.9 ** (np.abs(loc2 - loc1) - 1))
                # 点击时间权重
                click_time_weight = np.exp(0.7 ** np.abs(i_click_time - j_click_time))
                # 创建时间权重
                # created_time_weight = np.exp(0.8 ** np.abs(item_created_time_dict[i] - item_created_time_dict[j]))
                sim_dict[i].setdefault(j, 0)
                sim_dict[i][j] += loc_weight * click_time_weight / math.log(1+len(item_time_list)) # 对用户热度进行惩罚
    for i in sim_dict:
        for j in sim_dict[i]:
            sim_dict[i][j] /= math.sqrt(num_dict[i] * num_dict[j]) # 对物品热度进行惩罚
            
    # 按照物品相似度排序
    sorted_sim_dict = {k: list(sorted(v.items(), key=lambda x: x[1], reverse=True)) for k, v in sim_dict.items()}
    
    def get_recommendation(user):
        """ 获取当前用户的推荐结果 """
        if user not in train:
            print('user not exist: ', user)
            return []
        rec_items = {}
        seen_items = set([i[0] for i in train[user]]) # 用户点击过得物品
        for (item, _) in train[user]:
            for sim_item, _ in sorted_sim_dict[item][:K]:
                # 去掉用户见过的
                if sim_item in seen_items:
                    continue
                if sim_item not in rec_items:
                    rec_items[sim_item] = 0
                rec_items[sim_item] += sim_dict[item][sim_item]
        return list(sorted(rec_items.items(), key=lambda x: x[1], reverse=True))[:N]
    return get_recommendation

##### UserCF

In [6]:
@timmer
def userCF(train, K, N):
    """ userCF
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    # 计算用户相似度矩阵
    sim_dict = {}
    num_dict = defaultdict(int) # user计数器
    
    # 建立物品-用户矩阵
    item_users_dict = {}
    for user, item_time_list in tqdm(train.items()):
        # 位置、itemID、点击时间
        for loc, (i, i_click_time) in enumerate(item_time_list):
            if i not in item_users_dict:
                item_users_dict[i] = set()
            item_users_dict[i].add(user)
    
    # 计算用户相似度矩阵
    for item, users in tqdm(item_users_dict.items()):
        for i in users:
            num_dict[i] += 1
            sim_dict.setdefault(i, {})
            for j in users:
                sim_dict[i].setdefault(j, 0)
                sim_dict[i][j] += 1

    for i in sim_dict:
        for j in sim_dict[i]:
            sim_dict[i][j] /= math.sqrt(num_dict[i] * num_dict[j])
    
    # 按照用户相似度排序
    sorted_sim_dict = {k: list(sorted(v.items(), key=lambda x: x[1], reverse=True)) for k, v in sim_dict.items()}
    
    def get_recommendation(user):
        """ 根据当前用户最相近的K个用户推荐N个物品 """
        if user not in train:
            print('user not exist: ', user)
            return []
        rvi = 1
        rec_items = {}
        seen_items = set([i[0] for i in train[user]]) # 用户点击过的物品
        for v, wuv in sorted_sim_dict[user][:K]:
            for (item, _) in train[v]:
                if item in seen_items:
                    continue
                if item not in rec_items:
                    rec_items[item] = 0
                rec_items[item] += wuv * rvi
        return list(sorted(rec_items.items(), key=lambda x: x[1], reverse=True))[:N]

    return get_recommendation

@timmer
def userIIF(train, K, N):
    """ userIIF
    :param train 训练集
    :params: K TopK相似用户/物品
    :params: N 推荐物品个数TopN
    """
    # 计算用户相似度矩阵
    sim_dict = {}
    num_dict = defaultdict(int) # user计数器
    
    # 建立物品-用户矩阵
    item_users_dict = {}
    for user, item_time_list in tqdm(train.items()):
        # 位置、itemID、点击时间
        for loc, (i, i_click_time) in enumerate(item_time_list):
            if i not in item_users_dict:
                item_users_dict[i] = set()
            item_users_dict[i].add(user)
    
    # 计算用户相似度矩阵
    for item, users in tqdm(item_users_dict.items()):
        for i in users:
            num_dict[i] += 1
            sim_dict.setdefault(i, {})
            for j in users:
                sim_dict[i].setdefault(j, 0)
                sim_dict[i][j] += 1 / math.log(1+len(users)) # 对物品热度进行惩罚

    for i in sim_dict:
        for j in sim_dict[i]:
            sim_dict[i][j] /= math.sqrt(num_dict[i] * num_dict[j])
    
    # 按照用户相似度排序
    sorted_sim_dict = {k: list(sorted(v.items(), key=lambda x: x[1], reverse=True)) for k, v in sim_dict.items()}
    
    def get_recommendation(user):
        """ 根据当前用户最相近的K个用户推荐N个物品 """
        if user not in train:
            print('user not exist: ', user)
            return []
        rvi = 1
        rec_items = {}
        seen_items = set([i[0] for i in train[user]]) # 用户点击过的物品
        for v, wuv in sorted_sim_dict[user][:K]:
            for (item, _) in train[v]:
                if item in seen_items:
                    continue
                if item not in rec_items:
                    rec_items[item] = 0
                rec_items[item] += wuv * rvi
        return list(sorted(rec_items.items(), key=lambda x: x[1], reverse=True))[:N]

    return get_recommendation

##### 模型评估

In [7]:
class Metric():
    def __init__(self, train, test, get_recommendation):
        """ 获取测试集所有用户的推荐结果，进行评估 """
        self.train = train
        self.test = test
        self.get_recommendation = get_recommendation
        self.rec_result = self.get_rec_result()
    
    def get_rec_result(self):
        """ 获取测试集推荐结果 """
        rec_result = {}
        for user in tqdm(self.test):
            rec_result[user] = self.get_recommendation(user)
        return rec_result
    
    # {user1: [(item1, time1), (item2, time2)..]}
    def precision(self):
        """ 精确率，命中的item占所有推荐item的比例 """
        _hit, _all = 0, 0
        for user in self.test:
            items = set([i[0] for i in self.test[user]])
            rank = self.rec_result[user]
            for item, _ in rank:
                if item in items:
                    _hit += 1
            _all += len(rank)
        return round(_hit / _all * 100, 2)
    
    def recall(self):
        """ 召回率，命中的item占所有真实点击item的比例 """
        _hit, _all = 0, 0
        for user in self.test:
            items = set([i[0] for i in self.test[user]])
            rank = self.rec_result[user]
            for item, _ in rank:
                if item in items:
                    _hit += 1
            _all += len(items)
        return round(_hit / _all * 100, 2)
    
    def coverage(self):
        """ 覆盖率，推荐的item占所有商品的比例 """
        all_items, rec_items = set(), set()
        for user in self.test:
            for item in set([i[0] for i in self.train[user]]):
                all_items.add(item)
            rank = self.rec_result[user]
            for item, _ in rank:
                rec_items.add(item)
        return round(len(rec_items) / len(all_items) * 100, 2)
        
    def popularity(self):
        """ 流行度，衡量商品热度的方式 """
        item_popularity_dict = {}
        for user in self.test:
            for item in [i[0] for i in self.train[user]]:
                if item not in item_popularity_dict:
                    item_popularity_dict[item] = 0
                item_popularity_dict[item] += 1
        
        _all, _p = 0, 0 # item数量 流行度
        for user in self.test:
            rank = self.rec_result[user]
            for item, _ in rank:
                _p += math.log(1+item_popularity_dict[item])
                _all += 1
        return round(_p / _all, 6)
        
    def eval(self):
        """ 评估测试集各项指标 """
        model_metric = {
            'Precision': self.precision(),
            'Recall': self.recall(),
            'Coverage': self.coverage(),
            'Popularity': self.popularity(),
        }
        print('Metric:', model_metric)
        return model_metric

class Experiment():
    def __init__(self, M, K, N, filepath, algname):
        """
        :params: M 进行多少次实验
        :params: K TopK相似用户/物品
        :params: N 推荐物品个数TopN
        :params: filepath 数据路径
        :params: 算法名称
        """
        self.M = M
        self.K = K
        self.N = N
        self.filepath = filepath
        self.algname = algname
        self.alg = {
            "Random": random_rec,
            "Hot": hot_rec,
            "ItemCF": itemCF,
            "ItemIUF": itemIUF,
            "ItemWeight": itemWeight,
            "UserCF": userCF,
            "UserIIF": userIIF
        }
    
    @timmer
    def single_run(self, train, test):
        """
        :params: train 训练数据集
        :params: test 测试数据集
        :return: 各项指标
        """
        get_recommendation = self.alg[self.algname](train, self.K, self.N)
        metric = Metric(train, test, get_recommendation)
        return metric.eval()
    
    @timmer
    def run(self):
        dataset = Dataset(self.filepath)
        train, _test = dataset.split_data(self.M, k=1)
        # 只保留有过历史行为的用户进行测试
        test = {}
        for k, v in _test.items():
            if k in train.keys():
                test[k] = _test[k]
            else:
                print('del test user: ', k)
        
        metric = self.single_run(train, test)

In [11]:
# 将数据集划分为8折，根据K个相似项中推荐N个物品
M, N, K = 8, 10, 10
file_path = '../data/ml-1m/ratings.dat'

# 模型评估
ALGS = ["Random", "Hot", "ItemCF", "ItemIUF", "ItemWeight", "UserCF", "UserIIF"]

for alg in ALGS:
    print(f'=============== {alg} START ===============')
    exp = Experiment(M, K, N, file_path, alg)
    exp.run()
    print(f'=============== {alg} END ===============\n')



  0%|          | 0/6040 [00:00<?, ?it/s]

Func random_rec, run time: 0.15259242057800293


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 0.6, 'Recall': 0.29, 'Coverage': 100.0, 'Popularity': 4.405071}
Func single_run, run time: 2.457427501678467
Func run, run time: 7.701402425765991



  0%|          | 0/6040 [00:00<?, ?it/s]

Func hot_rec, run time: 0.1685488224029541


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 12.97, 'Recall': 6.22, 'Coverage': 2.47, 'Popularity': 7.718132}
Func single_run, run time: 10.86371660232544
Func run, run time: 16.204662084579468



  0%|          | 0/6040 [00:00<?, ?it/s]

Func itemCF, run time: 111.91621017456055


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 21.82, 'Recall': 10.46, 'Coverage': 19.54, 'Popularity': 7.224026}
Func single_run, run time: 116.18641924858093
Func run, run time: 121.39693021774292



  0%|          | 0/6040 [00:00<?, ?it/s]

Func itemIUF, run time: 208.0548927783966


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 22.45, 'Recall': 10.76, 'Coverage': 17.67, 'Popularity': 7.343012}
Func single_run, run time: 212.98111820220947
Func run, run time: 218.3597617149353



  0%|          | 0/6040 [00:00<?, ?it/s]

Func itemWeight, run time: 2673.6163368225098


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 21.01, 'Recall': 10.07, 'Coverage': 28.39, 'Popularity': 6.927734}
Func single_run, run time: 2681.159425973892
Func run, run time: 2686.8138329982758



  0%|          | 0/6040 [00:00<?, ?it/s]

  0%|          | 0/3685 [00:00<?, ?it/s]

Func userCF, run time: 169.69799041748047


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 20.22, 'Recall': 9.7, 'Coverage': 42.99, 'Popularity': 6.959068}
Func single_run, run time: 175.29202723503113
Func run, run time: 180.80291604995728



  0%|          | 0/6040 [00:00<?, ?it/s]

  0%|          | 0/3685 [00:00<?, ?it/s]

Func userIIF, run time: 343.42542815208435


  0%|          | 0/5994 [00:00<?, ?it/s]

Metric: {'Precision': 19.89, 'Recall': 9.54, 'Coverage': 45.4, 'Popularity': 6.902531}
Func single_run, run time: 349.8010346889496
Func run, run time: 355.2654721736908



##### 评估指标

| Mode  | 场景 |  优点 |  缺点 | 精确率 | 召回率 | 覆盖率 | 流行度 |
| :-------------------: |:-------------------: |:-------------------: |:-------------------: |:-------------------: |:-------------------: |:-------------------: |:-------------------: |
| **Random** | - | - | - | 0.6 | 0.29 | 100.0 | 4.405 |
| **Hot** | 多路召回中常作为补足策略对召回候选进行补足。 | - | - | 12.97 | 6.22 | 2.47 | 7.7181 |
| **UserCF** | 基于用户相似度进行推荐，具有很强的社交属性，适合兴趣变化快的应用，例如短视频、新闻等。 | 简单有效 | 推荐结果的头部效应明显，泛化能力弱 | 20.22 | 9.7 | 42.99 | 6.959 |
| **UserCF-IIF** | - | - | - | 19.89 | 9.54 | 45.4 | 6.903 |
| **ItemCF** | 基于物品相似度的推荐算法，适合兴趣变化稳定的应用，例如电商、电影、关卡等 | 简单有效 | 推荐结果的头部效应明显，泛化能力弱 | 21.82 | 10.46 | 19.54 | 7.224 |
| **ItemCF-IUF** | - | - | - | 22.45 | 10.76 | 17.67 | 7.343 |
| **ItemCF-Weight** | - | - | - | 21.01 | 10.07 | 28.39 | 6.928 |

在该电影推荐场景中，ItemCF-IUF算法的PR效果最好。