## 关联规则中的支持度、置信度和提升度

### 支持度（Support）
![title](支持度.png)
### 置信度（Confidence）
![title](置信度.png)
- 支持度是一种重要度量，因为支持度很低的规则可能只是偶然出现。从商务角度来看，低支持度的规则多半也是无意义的，因为对顾客很少同时购买的商品进行促销可能并无益处。因此，支持度通常用来删去那些无意义的规则。此外，支持度还具有一种期望的性质，可以用于关联规则的有效发现。

- 置信度度量通过规则进行推理具有可靠性。对于给定的规则X→Y，置信度越高，Y在包含X的事务中出现的可能性就越大。置信度也可以估计Y在给定X下的条件概率。

- 同时，应当小心解释关联分析的结果。由关联规则作出的推论并不必然蕴涵因果关系。它只表示规则前件和后件同时出现的一种概率。

### 提升度（Lift）
![title](提升度.png)

参考资料：
- [机器学习(10): Apriori算法 小结及实验](https://blog.csdn.net/zaishuiyifangxym/article/details/97645929?utm_medium=distribute.pc_relevant.none-task-blog-baidujs-1)
- [【统计分析】关联规则之置信度，支持度，提升度](https://blog.csdn.net/weixin_42057852/article/details/82661667?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.nonecase)


## 使用efficient_apriori工具包
效率较高，但返回参数较少

In [41]:
import pandas as pd
import numpy as np
from efficient_apriori import apriori
# 同时输出多行
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all" 

In [40]:
dataset = pd.read_csv('./Market_Basket_Optimisation.csv', header = None)
dataset.head()
dataset.shape

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


(7501, 20)

In [49]:
# 将数据存放到transactions中
transactions = []
for i in range(0, dataset.shape[0]):
    temp = []
    for j in range(0, 20):
        if str(dataset.values[i, j]) != 'nan':
           temp.append(str(dataset.values[i, j]))
    transactions.append(temp)
#print(transactions)
# 挖掘频繁项集和频繁规则
# 最小支持度min_support，最小置信度min_confidence
itemsets, rules = apriori(transactions, min_support=0.02,  min_confidence=0.4)
print("频繁项集：", itemsets)
print("关联规则：", rules)

频繁项集： {1: {('avocado',): 250, ('energy drink',): 200, ('shrimp',): 536, ('frozen smoothie',): 475, ('low fat yogurt',): 574, ('green tea',): 991, ('cottage cheese',): 239, ('almonds',): 153, ('honey',): 356, ('olive oil',): 494, ('vegetables mix',): 193, ('tomato juice',): 228, ('mineral water',): 1788, ('salmon',): 319, ('eggs',): 1348, ('burgers',): 654, ('meatballs',): 157, ('turkey',): 469, ('milk',): 972, ('whole wheat rice',): 439, ('energy bar',): 203, ('french fries',): 1282, ('whole wheat pasta',): 221, ('soup',): 379, ('spaghetti',): 1306, ('frozen vegetables',): 715, ('cookies',): 603, ('cooking oil',): 383, ('champagne',): 351, ('chicken',): 450, ('oil',): 173, ('chocolate',): 1229, ('fresh tuna',): 167, ('tomatoes',): 513, ('red wine',): 211, ('pepper',): 199, ('ham',): 199, ('pancakes',): 713, ('grated cheese',): 393, ('fresh bread',): 323, ('ground beef',): 737, ('escalope',): 595, ('herb & pepper',): 371, ('strawberries',): 160, ('cake',): 608, ('hot dogs',): 243, ('bro

## mlxtend.frequent_patterns工具包

In [52]:
from mlxtend.frequent_patterns import apriori as api
from mlxtend.frequent_patterns import association_rules

In [59]:
#数据整理
temp_list = []
for i in range(0,dataset.shape[0]):
    temp_str = ''
    for j in range(0,20):
        if str(dataset.values[i,j]) != 'nan':
            temp_str += str(dataset.values[i,j])+','
            temp_list.append(temp_str)
dataset_new = pd.DataFrame(data=temp_list)
dataset_new.columns = ['MarketBasket']
dataset_new.head()

Unnamed: 0,MarketBasket
0,"shrimp,"
1,"shrimp,almonds,"
2,"shrimp,almonds,avocado,"
3,"shrimp,almonds,avocado,vegetables mix,"
4,"shrimp,almonds,avocado,vegetables mix,green gr..."


In [55]:
#对数据进行one-hot编码
dataset_new_hot_encoded = dataset_new.drop('MarketBasket',1).join(dataset_new.MarketBasket.str.get_dummies(','))
dataset_new_hot_encoded = dataset_new_hot_encoded.dropna(axis=1)
dataset_new_hot_encoded.shape

(7501, 120)

In [57]:
# 挖掘频繁项集
itemsets = api(dataset_new_hot_encoded,use_colnames=True, min_support=0.05)
itemsets = itemsets.sort_values(by="support" , ascending=False) 
print('-'*20, '频繁项集', '-'*20)
print(itemsets)

-------------------- 频繁项集 --------------------
     support                    itemsets
16  0.238368             (mineral water)
6   0.179709                      (eggs)
21  0.174110                 (spaghetti)
8   0.170911              (french fries)
3   0.163845                 (chocolate)
12  0.132116                 (green tea)
15  0.129583                      (milk)
13  0.098254               (ground beef)
10  0.095321         (frozen vegetables)
18  0.095054                  (pancakes)
0   0.087188                   (burgers)
1   0.081056                      (cake)
4   0.080389                   (cookies)
7   0.079323                  (escalope)
14  0.076523            (low fat yogurt)
19  0.071457                    (shrimp)
22  0.068391                  (tomatoes)
17  0.065858                 (olive oil)
9   0.063325           (frozen smoothie)
23  0.062525                    (turkey)
2   0.059992                   (chicken)
27  0.059725  (mineral water, spaghetti)
24  0.0585

In [58]:
# 根据频繁项集计算关联规则
# lift指标进行筛选
rules =  association_rules(itemsets, metric='lift', min_threshold=1)
rules = rules.sort_values(by="lift" , ascending=False) 
print('-'*20, '关联规则', '-'*20)
print(rules)

-------------------- 关联规则 --------------------
       antecedents      consequents  antecedent support  consequent support  \
0  (mineral water)      (spaghetti)            0.238368            0.174110   
1      (spaghetti)  (mineral water)            0.174110            0.238368   
3      (chocolate)  (mineral water)            0.163845            0.238368   
2  (mineral water)      (chocolate)            0.238368            0.163845   
4  (mineral water)           (eggs)            0.238368            0.179709   
5           (eggs)  (mineral water)            0.179709            0.238368   

    support  confidence      lift  leverage  conviction  
0  0.059725    0.250559  1.439085  0.018223    1.102008  
1  0.059725    0.343032  1.439085  0.018223    1.159314  
3  0.052660    0.321400  1.348332  0.013604    1.122357  
2  0.052660    0.220917  1.348332  0.013604    1.073256  
4  0.050927    0.213647  1.188845  0.008090    1.043158  
5  0.050927    0.283383  1.188845  0.008090    1.06