# Apriori 
The Apriori algorithm can be considered the foundational algorithm in basket analysis. Basket analysis is the study of a client’s basket while shopping.
The goal is to find combinations of products that are often bought together, which we call frequent itemsets. The technical term for the domain is Frequent Itemset Mining.
The Apriori algorithm works in a horizontal sense imitating the **Breadth-First Search of a graph**.

## Steps of the Apriori algorithm
* Step 1. Computing the support for each individual item
* Step 2. Deciding on the support threshold
* Step 3. Selecting the frequent items
* Step 4. Finding the support of the frequent itemsets
* Step 5. Repeat for larger sets
* Step 6. Generate Association Rules and compute confidence
* Step 7. Compute lift

## Apriori accuracy: how to balance support, confidence, and lift of a rule?
This basically gives us three metrics to interpret:
support (the number of times, or percentage, that the products co-occur)
confidence (the number of times that a rule occurs, also the conditional probability of the right-hand side given the left-hand side)
lift (the strength of association)

In [1]:
!pip install efficient-apriori

Collecting efficient-apriori
  Downloading efficient_apriori-2.0.1-py3-none-any.whl (14 kB)
Installing collected packages: efficient-apriori
Successfully installed efficient-apriori-2.0.1


In [4]:
import pandas as pd

In [3]:
# store the item sets as tuples of strings in a list
transactions = [
    ("beer", "wine", "cheese"),
    ("beer", "potato chips"),
    ("eggs", "flour", "butter", "cheese"),
    ("eggs", "flour", "butter", "beer", "potato chips"),
    ("wine", "cheese"),
    ("potato chips"),
    ("eggs", "flour", "butter", "wine", "cheese"),
    ("eggs", "flour", "butter", "beer", "potato chips"),
    ("wine", "beer"),
    ("beer", "potato chips"),
    ("butter", "eggs"),
    ("beer", "potato chips"),
    ("flour", "eggs"),
    ("beer", "potato chips"),
    ("eggs", "flour", "butter", "wine", "cheese"),
    ("beer", "wine", "potato chips", "cheese"),
    ("wine", "cheese"),
    ("beer", "potato chips"),
    ("wine", "cheese"),
    ("beer", "potato chips"),
]

In [5]:
# We simply convert the transaction list into a dataframe
data = pd.DataFrame(transactions)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,beer,wine,cheese,,,,,,,,,
1,beer,potato chips,,,,,,,,,,
2,eggs,flour,butter,cheese,,,,,,,,
3,eggs,flour,butter,beer,potato chips,,,,,,,
4,wine,cheese,,,,,,,,,,
5,p,o,t,a,t,o,,c,h,i,p,s
6,eggs,flour,butter,wine,cheese,,,,,,,
7,eggs,flour,butter,beer,potato chips,,,,,,,
8,wine,beer,,,,,,,,,,
9,beer,potato chips,,,,,,,,,,


## Setting the parameters for the algorithm and running the algorithm
* **min_support:** this is the support threshold that was explained in the algorithm section above. A small difference is that it is expressed as a percentage here rather than a number.
* **min_confidence:** this is merely a filter that filters out rules that do not meet minimum confidence. You can put it to zero if you want to see all the generated rules.

In [6]:
from efficient_apriori import apriori

# our min support is 7, but it has to be expressed as a percentage for efficient-apriori
min_support = 7/len(transactions) 

# min confidence allows you to delete rules with low confidence.
# For now set min_confidence = 0 to obtain all the rules
min_confidence = 0
itemsets, rules = apriori(transactions, min_support=min_support, min_confidence=min_confidence)


In [7]:
print(itemsets)

{1: {('beer',): 11, ('wine',): 8, ('cheese',): 8, ('potato chips',): 9, ('eggs',): 7}, 2: {('beer', 'potato chips'): 9, ('cheese', 'wine'): 7}}


In the first pass, (key = 1) you have the individual products with their number of occurrences:
* beer: 11
* wine: 8
* cheese: 8
* potato chips: 9
* eggs: 7

In the second pass we have pairs of those individual products that scored at least the minimum support of seven. We also have their number of occurrences:
* (cheese, wine): 7
* (beer, potato chips): 9

## Inspecting the rules and their metrics

In [8]:
for rule in rules:
  print(rule)

{potato chips} -> {beer} (conf: 1.000, supp: 0.450, lift: 1.818, conv: 450000000.000)
{beer} -> {potato chips} (conf: 0.818, supp: 0.450, lift: 1.818, conv: 3.025)
{wine} -> {cheese} (conf: 0.875, supp: 0.350, lift: 2.188, conv: 4.800)
{cheese} -> {wine} (conf: 0.875, supp: 0.350, lift: 2.188, conv: 4.800)


In **conclusion** to this data, we could argue that the lift of the rules 
Wine => Cheese and 
Cheese => Wine is very high. 
The owner of this night store may probably want to put cheese and wine close to each other. The association of 
Potato Chips => Beer and 
Beer => Potato Chips a
re a bit less strong, but high enough to also put beer and potato chips at the same place in the store.