* Association rules, market basket analysis, itemsets.
* The **support** of X is the percentage of transactions in the dataset containing X. A rule with low support is not useful because it dedicates resources to chasing a small-impact target.
* The **confidence** is the percentage of transactions containing X also containing Y, e.g. the confidence that one item in a transaction will lead to another.
* Useful rules subscribe to a certain **minimum support** and **minimum confidence**.
* The simplest algorithm for generating association rules is the apriori algorithm.
* **Downwards closure property**: every non-empty subset of an itemset with certain minimum support will also have that minimum support.
* Downwards closure is used in the implementation of the **apriori algorithm**, one efficient implementation of simple association rule selection.
* On the kth pass apriori takes all of the prior generated itemsets deemed frequent and constructs all possible further itemsets. Then it reads the transactions and creates a running tally of possible further itemsets.
* This is an example of a **level-wise search**.

In [1]:
import apriori

In [2]:
# apriori.init_pass("75000-out1.csv", 0.1)[0]

In [3]:
# Set: [[7], [28], [45]]
# apriori.candidate_gen(apriori.init_pass("75000-out1.csv", 0.1)[0])

In [4]:
# The following call runs the apriori algorithm with a minimum support of 1%.
apriori.apriori("75000-out1.csv", 0.01)



[[0],
 [1],
 [2],
 [3],
 [4],
 [5],
 [6],
 [7],
 [8],
 [9],
 [10],
 [11],
 [12],
 [13],
 [14],
 [15],
 [16],
 [17],
 [18],
 [19],
 [20],
 [21],
 [22],
 [23],
 [24],
 [25],
 [26],
 [27],
 [28],
 [29],
 [30],
 [31],
 [32],
 [33],
 [34],
 [35],
 [36],
 [37],
 [38],
 [39],
 [40],
 [41],
 [42],
 [43],
 [44],
 [45],
 [46],
 [47],
 [48],
 [49],
 [3, 35],
 [7, 15],
 [23, 40],
 [41, 43],
 [24, 40],
 [16, 45],
 [7, 49],
 [12, 31],
 [29, 47],
 [11, 45],
 [0, 2],
 [7, 11],
 [31, 36],
 [33, 42],
 [17, 47],
 [2, 46],
 [7, 45],
 [31, 48],
 [37, 45],
 [23, 41],
 [40, 41],
 [7, 37],
 [27, 28],
 [15, 49],
 [32, 45],
 [16, 32],
 [1, 19],
 [0, 46],
 [36, 48],
 [23, 43],
 [40, 43],
 [24, 41],
 [18, 35],
 [3, 18],
 [17, 29],
 [12, 36],
 [24, 43],
 [23, 24],
 [12, 48],
 [14, 44],
 [4, 9],
 [5, 22],
 [11, 37],
 [12, 31, 48],
 [7, 15, 49],
 [24, 40, 41],
 [23, 24, 40],
 [40, 41, 43],
 [17, 29, 47],
 [23, 40, 43],
 [23, 24, 41],
 [16, 32, 45],
 [31, 36, 48],
 [24, 41, 43],
 [7, 37, 45],
 [23, 24, 43],
 [11, 37,

In [5]:
apriori.candidate_gen(
 [[3, 35],
 [7, 15],
 [23, 40],
 [41, 43],
 [24, 40],
 [16, 45],
 [7, 49],
 [12, 31],
 [29, 47],
 [11, 45],
 [0, 2],
 [7, 11],
 [31, 36],
 [33, 42],
 [17, 47],
 [2, 46],
 [7, 45],
 [31, 48],
 [37, 45],
 [23, 41],
 [40, 41],
 [7, 37],
 [27, 28],
 [15, 49],
 [32, 45],
 [16, 32],
 [1, 19],
 [0, 46]])

[[7, 15, 49],
 [7, 15, 45],
 [23, 40, 41],
 [16, 32, 45],
 [7, 45, 49],
 [0, 2, 46],
 [7, 11, 45],
 [31, 36, 48],
 [7, 37, 45]]