# Aprior算法


**Reference**:
<br>Aprior算法概述：<br>
https://wizardforcel.gitbooks.io/dm-algo-top10/content/apriori.html
<br>
https://www.cnblogs.com/nxld/p/6380417.html
<br>超市数据集：<br>https://drive.google.com/file/d/1y5DYn0dGoSbC22xowBq2d4po6h1JxcTQ/view
<br>
https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python/

In [1]:
# 导入各种包
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from AAA import apyori

In [2]:
# 导入数据
df = pd.read_csv('store_data.csv', header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


该数据集中描述了法国一家零售店在一周内20种商品的7500单交易。每一行数据表示为同一用户购买的东西，例如，用户1买了burgers，meatballs，eggs三种东西

In [3]:
# 去除空白值，分别得到每个用户购买的东西
record = df.stack().groupby(level=0).apply(list).tolist()
# 看看record长啥样（前20个）：
for i in range(0, 20):
    print(record[i])

['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']
['burgers', 'meatballs', 'eggs']
['chutney']
['turkey', 'avocado']
['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea']
['low fat yogurt']
['whole wheat pasta', 'french fries']
['soup', 'light cream', 'shallot']
['frozen vegetables', 'spaghetti', 'green tea']
['french fries']
['eggs', 'pet food']
['cookies']
['turkey', 'burgers', 'mineral water', 'eggs', 'cooking oil']
['spaghetti', 'champagne', 'cookies']
['mineral water', 'salmon']
['mineral water']
['shrimp', 'chocolate', 'chicken', 'honey', 'oil', 'cooking oil', 'low fat yogurt']
['turkey', 'eggs']
['turkey', 'fresh tuna', 'tomatoes', 'spaghetti', 'mineral water', 'black tea', 'salmon', 'eggs', 'chicken', 'extra dark chocolate']
['m

**Apriori算法中的几个主要指标：**
* `min_support` : 最小支持度 =  出现次数 / 总事务数<br>
关联规则A->B的支持度support=P(AB)，指的是事件A和事件B同时发生的概率
* `min_confidence` : 置信度<br>
confidence=P(B|A)=P(AB)/P(A),指的是发生事件A的基础上发生事件B的概率。比如说在规则Computer => antivirus_software , 其中 support=2%, confidence=60%中，就表示的意思是所有的商品交易中有2%的顾客同时买了电脑和杀毒软件，并且购买电脑的顾客中有60%也购买了杀毒软件。
* `min_lift` : 最小提升度<br>
是指可信度与期望可信度的比值
* `min_length` : 物品的最小数量

例如：用一个简单的例子说明。表1是顾客购买记录的数据库D，包含6个事务。项集I={网球拍,网球,运动鞋,羽毛球}。考虑关联规则：网球拍 => 网球，事务1,2,3,4,6包含网球拍，事务1,2,6同时包含网球拍和网球，支持度support= 3/6 = 0.5，置信度confident= 3\5 = 0.6。若给定最小支持度0.5，最小置信度0.6，关联规则网球拍 => 网球是有趣的，则认为购买网球拍和购买网球之间存在强关联。

**建立每两种商品中的关联法则**
<br>假设我们只需要那些每天至少购买5次的商品的规则，或者一周内购买7 x 5 = 35次的商品的规则，因为数据集是一周的时间段。支持度可以计算为35/7500 = 0.0045;置信度是35%;提升度为3;将min_length指定为2，因为规则中至少需要两个产品。

In [4]:
association_rules = apyori.apriori(record, min_support=0.0045, min_confidence=0.35, min_lift=3, min_length=2)
association_results = list(association_rules)
# print(association_results)

In [5]:
for item in association_results:
    pair = item[0] 
    items = [x for x in pair]
    print("关联法则: " + items[0] + " => " + items[1])
    print("支持度: " + str(item[1]))
    print("置信度: " + str(item[2][0][2]))
    print("提升度: " + str(item[2][0][3]))
    print("------------------------")

关联法则: pasta => escalope
支持度: 0.005865884548726837
置信度: 0.3728813559322034
提升度: 4.700811850163794
------------------------
关联法则: ground beef => tomato sauce
支持度: 0.005332622317024397
置信度: 0.3773584905660377
提升度: 3.840659481324083
------------------------
关联法则: cooking oil => ground beef
支持度: 0.004799360085321957
置信度: 0.5714285714285714
提升度: 3.2819951870487856
------------------------
关联法则: olive oil => milk
支持度: 0.004799360085321957
置信度: 0.4235294117647058
提升度: 3.2684095860566447
------------------------
关联法则: ground beef => mineral water
支持度: 0.006665777896280496
置信度: 0.39062500000000006
提升度: 3.975682666214383
------------------------
关联法则: ground beef => spaghetti
支持度: 0.006399146780429276
置信度: 0.3934426229508197
提升度: 4.004359721511667
------------------------
关联法则: ground beef => spaghetti
支持度: 0.005999200106652446
置信度: 0.5232558139534884
提升度: 3.005315360233627
------------------------


**结果分析:**<br>
从第一项可以看到，炸肉块(escalope)和意面(pasta)通常一起购买。支持度是0.0048,是含有炸肉块的交易数量除以总交易数；置信度为0.3728，这表明在所有包含炸肉块的交易中，37.28%的交易也包含意面；4.7的提升度说明，购买意面的顾客购买炸鸡块的可能性是普通顾客的4.7倍。因此，商家在放置商品时可以把炸鸡块和意面放在一起。




In [6]:
ar = apyori.apriori(record, min_support = 0.01, min_confidence = 0.1)
a_results = list(ar)
print(a_results)
for item in a_results:
    pair = item[0] 
    items = [x for x in pair]
    print("关联法则: " + items[0] + " => " + items[1])
    print("支持度: " + str(item[1]))
    print("置信度: " + str(item[2][0][2]))
    print("提升度: " + str(item[2][0][3]))
    print("------------------------")

[RelationRecord(items=frozenset({'chocolate'}), support=0.1638448206905746, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'chocolate'}), confidence=0.1638448206905746, lift=1.0)]), RelationRecord(items=frozenset({'eggs'}), support=0.17970937208372217, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'eggs'}), confidence=0.17970937208372217, lift=1.0)]), RelationRecord(items=frozenset({'french fries'}), support=0.1709105452606319, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'french fries'}), confidence=0.1709105452606319, lift=1.0)]), RelationRecord(items=frozenset({'green tea'}), support=0.13211571790427942, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'green tea'}), confidence=0.13211571790427942, lift=1.0)]), RelationRecord(items=frozenset({'milk'}), support=0.12958272230369283, ordered_statistics=[OrderedStatistic(items_base=frozenset(), ite

In [8]:
myTree = ['a',   #root
      ['b',  #left subtree
       ['d', [], []],
       ['e', [], []] ],
      ['c',  #right subtree
       ['f', [], []],
       [] ]

print(myTree)

SyntaxError: invalid syntax (<ipython-input-8-e0942f31f149>, line 9)