# ItemCF
+ 类似于UserCF, 在UserCF中, 我们考虑Alice和其他用户之间的相似度, 然后根据Alice的历史评分, 结合其他用户对item5的评分, 预测Alice对item5的评分.
+ 而在ItemCF中, 我们考虑item5和其他item之间的相关性, 然后找相关性高的item, 结合所有user对item5的平均评分, 综合加权预测item5的评分.
> 下面就直接调用库计算相关性了.

In [1]:
import pandas as pd
import numpy as np
df = pd.read_csv("demo_data.csv", header=0, index_col=None)
df

Unnamed: 0,name,item1,item2,item3,item4,item5
0,Alice,5,3,4,4,-1
1,user1,3,1,2,3,3
2,user2,4,3,4,3,5
3,user3,3,3,1,5,4
4,user4,1,5,5,2,1


In [2]:
# 余弦相似度
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(df.iloc[1:, 1:].T)
cosine_sim = pd.DataFrame(cosine_sim, columns=df.columns[1:], index=df.columns[1:])
cosine_sim

Unnamed: 0,item1,item2,item3,item4,item5
item1,1.0,0.738988,0.747667,0.936916,0.9941
item2,0.738988,1.0,0.933564,0.813629,0.738851
item3,0.747667,0.933564,1.0,0.709718,0.72261
item4,0.936916,0.813629,0.709718,1.0,0.939558
item5,0.9941,0.738851,0.72261,0.939558,1.0


In [3]:
# 皮尔逊相似系数
pearson_sim = np.corrcoef(df.iloc[1:, 1:].T)
pearson_sim = pd.DataFrame(pearson_sim, columns=df.columns[1:], index=df.columns[1:])
pearson_sim

Unnamed: 0,item1,item2,item3,item4,item5
item1,1.0,-0.648886,-0.435286,0.473684,0.969458
item2,-0.648886,1.0,0.67082,-0.324443,-0.478091
item3,-0.435286,0.67082,1.0,-0.870572,-0.427618
item4,0.473684,-0.324443,-0.870572,1.0,0.581675
item5,0.969458,-0.478091,-0.427618,0.581675,1.0


In [4]:
# 发现最相关的是item1和item4
item5_base = np.average(df['item5'].iloc[1:])
weight_list = [pearson_sim.loc['item5', 'item1'], pearson_sim.loc[ 'item5', 'item4']]
weight_list_sum = np.sum(weight_list)

In [5]:
item1_score = df.iloc[0, 1]
item4_score = df.iloc[0, 4]
item1_avg = np.average(df.iloc[:, 1].values)
item4_avg = np.average(df.iloc[:, 4].values)
item1_centered = item1_score - item1_avg
item4_centered = item4_score - item4_avg
item5_pred_score = item5_base + (item1_centered * weight_list[0] + item4_centered * weight_list[1]) / weight_list_sum
item5_pred_score

4.6