# Recommendations

## Data

In [4]:
critics={
    'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,'Just My Luck': 3.0, 'Superman Returns': 3.5, 
                  'You, Me and Dupree': 2.5, 'The Night Listener': 3.0},
    'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 
                     'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5}, 
    'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,'Superman Returns': 3.5, 'The Night Listener': 4.0},
    'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5},
    'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
                     'You, Me and Dupree': 2.0}, 
    'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,'The Night Listener': 3.0, 'Superman Returns': 5.0, 
                      'You, Me and Dupree': 3.5},
    'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

In [5]:
critics['Lisa Rose']['Lady in the Water']

2.5

## 相似度评价值
- 欧几里得距离（Euclidean Distance Score）
- 皮尔逊相关度（Pearson Correlation Score）

### 欧氏距离

**评价值：**
$$\frac{1}{1+欧式距离}$$
结果在`[0,1]`之间，数值越大，越相关

In [6]:
from math import sqrt

In [18]:
def sim_distance(prefs,person1,person2):
    shared_items={}
    for item in prefs[person1]:
        if item in prefs[person2]:
            shared_items[item]=1
        
    if len(shared_items)==0: return 0
    
    sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in shared_items])
    
    return 1/(1+ sqrt(sum_of_squares))      

**两两计算欧几里得距离：**

In [89]:
def calcu(prefs,func=sim_distance):
    key_l = list(critics.keys())    
    score_d={}
    for i in range(0, len(key_l)-1):      #列表index是从“0”开始的
        for m in range(i+1, len(key_l)):
            score = func(critics,key_l[i],key_l[m])
            score_d[key_l[i]+' : '+ key_l[m]]=score      
    return score_d

In [91]:
score_d=calcu(critics)

In [92]:
score_sorted= sorted(score_d.items(),key=lambda item:item[1])  #对字典进行排序，返回d.items()
# keys_sorted_by_value = sorted(score_d,key=lambda x:score_d[x])    #按键值排序，返回键的顺序
score_sorted[-1]

('Jack Matthews : Gene Seymour', 0.6666666666666666)

### 皮尔逊相关系数
参考：
1. [如何理解皮尔逊相关系数（Pearson Correlation Coefficient-知乎）？](https://www.zhihu.com/question/19734616)
2. [皮尔逊积矩相关系数-WikiPedia](https://zh.wikipedia.org/wiki/%E7%9A%AE%E5%B0%94%E9%80%8A%E7%A7%AF%E7%9F%A9%E7%9B%B8%E5%85%B3%E7%B3%BB%E6%95%B0)

$$r=\frac{\sum (x - \overline{x}) (y - \overline{y}) }{\sqrt{\sum (x - \overline{x})^2 (y - \overline{y})^2}}$$

In [85]:
from scipy.stats import pearsonr  #return r and p-value

In [83]:
def sim_pearson(prefs,p1,p2):
    shared_items={}
    for item in prefs[p1]:
        if item in prefs[p2]:
            shared_items[item]=1
    n=len(shared_items)
    
    x1=[prefs[p1][item] for item in shared_items]
    x2=[prefs[p2][item] for item in shared_items]
    r= pearsonr(x1,x2)[0]
    return r

In [84]:
sim_pearson(critics,'Lisa Rose','Gene Seymour')

0.39605901719066977

In [93]:
score_r=calcu(critics,sim_pearson)

In [100]:
score_r_sorted= sorted(score_r.items(),key=lambda item:item[1])
score_r_sorted[-1]

('Claudia Puig : Michael Phillips', 1.0)