| user id          | 产生行为的用户的唯一标识                                     |
| --------------- | ------------------------------------------------------------ |
| item id          | 产生行为的对象的唯一标识                                     |
| behavior type    | 行为的种类（比如是购买还是浏览）                             |
| context          | 产生行为的上下文，包括时间和地点等                           |
| behavior weight  | 行为的权重（如果是观看视频的行为，那么这个权重可以是观看时长；如果是打分行为，这个权重可以是分数） |
| behavior content | 行为的内容（如果是评论行为，那么就是评论的文本；如果是打标签的行为，就是标签） |

In [1]:
import random

def SplitData(data, M, k, seed):
    test = []
    train = []
    random.seed(seed)
    for user, item in data:
        if random.randint(0,M) == k:
            test.append([user, item])
        else:
            train.append([user, item])
    return train, test

#### 推荐系统召回率和准确率

**召回率** ：模型预测推荐给用户的物品（并且是在测试集中的）占用户测试集中物品的比例

**准确率** ：模型预测推荐给用户的物品（并且是在测试集中的）占所有预测的物品的比例

In [2]:
def Recall(train, test, N):
    hit = 0
    all = 0
    for user in train.keys():
        tu = test[user]
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            if item in tu:
                hit += 1
        all += len(tu)
    return hit/(all * 1.0)

In [3]:
def Precision(train, test, N):
    hit = 0
    all = 0
    for user in train.keys():
        tu = test[user]
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            if item in tu:
                hit += 1
        all += N
    return hit / (all * 1.0)

#### 覆盖率

In [4]:
def Coverage(train, test, N):
    recommend_items = set()
    all_items = set()
    for user in train.keys():
        for item in train[user].keys():
            all_items.add(item)
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            recommend_items.add(item)
    return len(recommend_items) / (len(all_items) * 1.0)

#### 新颖度
如果推荐出的物品都很热门，那么说明推荐的新颖度较低，否则说明推荐结果比较新颖

In [7]:
import math

def Popularity(train, test, N):
    item_popularity = dict()
    for user, items in train.items():
        for item in item.keys():
            if item not in item_popularity:
                item_popularity[item] = 0
            item_popularity[item] += 1
    
    ret = 0
    n = 0
    for user in train.keys():
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            ret += math.log(1+item_popularity[item])
            n += 1
    ret /= n*1.0
    return ret

### 协同过滤算法

1. 找到和目标用户兴趣相似的用户集合
2. 找到这个集合中用户喜欢的，并且目标用户没有听说过的。

#### 兴趣相似度
$$w_{uv} = \frac{|N(u)\cap N(v)|}{|N(u) \cup N(v)|}$$

#### 余弦相似度
$$w_{uv} = \frac{|N(u)\cap N(v)}{\sqrt{|N(u)||N(v)|}}$$

In [8]:
def UserSimilarity(train):
    W = dict()
    for u in train.keys():
        for v in train.keys():
            if u == v:
                continue
            W[u][v] = len(train[u]&train[v])
            W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
    return W