# Association rules mining.  Apriori algorithm

For this project, I used the efficient-apriori library instead of alternatives like mlxtend.frequent_patterns.

I this library because it allows running Apriori algorithm directly on transaction lists, without converting data into a wide, one-hot encoded table (which would require over 300 binary columns). This makes it both faster and more memory-efficient for this dataset, while still providing clear and interpretable association rules.

In [1]:
! pip install efficient-apriori

Collecting efficient-apriori
  Downloading efficient_apriori-2.0.6-py3-none-any.whl.metadata (6.7 kB)
Downloading efficient_apriori-2.0.6-py3-none-any.whl (14 kB)
Installing collected packages: efficient-apriori
Successfully installed efficient-apriori-2.0.6


In [2]:
from efficient_apriori import apriori
import pandas as pd
orders = pd.read_csv("takeaway_orders.csv")

In [7]:
# transactions = list of baskets, for example [['naan','pilau rice'], ['papadum','chutney','naan'], ...]
transactions = (orders.groupby('Order ID')['Item Name']
                        .apply(list)
                        .tolist())
itemsets, rules = apriori(transactions,  min_support=0.01, min_confidence=0.2)  # default min_confidence is 0.5

Itemsets are generated from 1 to 5 items in the itemset. Shown with the itemset frequency, respectively. Considering minimum support treshold of 1% and overall number of orders ≈ 20,000, minimum frequency is ~200 orders for an itemset.

In [8]:
#itemsets is dict of dict
#example of some itemsets and their frequency
display(itemsets[1][('mango chutney',)],
        itemsets[2][('garlic naan', 'naan',)],
        itemsets[3][('onion bhaji', 'pilau rice','plain papadum')],
        itemsets[4][('garlic naan','mango chutney','mint sauce','plain papadum')],
        itemsets[5][('mango chutney','mint sauce','onion chutney','pilau rice','plain papadum')])

3435

740

812

244

248

The algorithm is based on the observation that for {a, b} -> {c, d} to hold, both {a, b, c} -> {d} and {a, b, d} -> {c} must hold, since in general conf( {a, b, c} -> {d} ) >= conf( {a, b} -> {c, d} ). In other words, if either of the two one-consequent rules do not hold, then there is no need to ever consider the two-consequent rule.

In [9]:
#class efficient_apriori.Rule
len(rules)

1613

Association rules with the smallest and the biggets numbers of support, lift and confidence along with some random rules were compared to those received in Tableau dashboars. Confirmed identical results, and so - absence of noticable mistakes or discrepancies in calculations and conclusions.

In [10]:
# Print out every rule (first 100), sorted by support
for r in sorted(rules, key=lambda rule: rule.support, reverse=True)[:100]:
    print(r) # Prints the rule and its confidence, support, lift, conviction

{pilau rice} -> {naan} (conf: 0.418, supp: 0.178, lift: 1.231, conv: 1.135)
{naan} -> {pilau rice} (conf: 0.524, supp: 0.178, lift: 1.231, conv: 1.206)
{plain papadum} -> {pilau rice} (conf: 0.504, supp: 0.165, lift: 1.184, conv: 1.158)
{pilau rice} -> {plain papadum} (conf: 0.387, supp: 0.165, lift: 1.184, conv: 1.098)
{plain papadum} -> {mango chutney} (conf: 0.457, supp: 0.150, lift: 2.618, conv: 1.521)
{mango chutney} -> {plain papadum} (conf: 0.856, supp: 0.150, lift: 2.618, conv: 4.688)
{plain papadum} -> {naan} (conf: 0.387, supp: 0.127, lift: 1.140, conv: 1.077)
{naan} -> {plain papadum} (conf: 0.373, supp: 0.127, lift: 1.140, conv: 1.073)
{pilau rice} -> {chicken tikka masala} (conf: 0.267, supp: 0.114, lift: 1.504, conv: 1.122)
{chicken tikka masala} -> {pilau rice} (conf: 0.640, supp: 0.114, lift: 1.504, conv: 1.597)
{pilau rice} -> {bombay aloo} (conf: 0.264, supp: 0.112, lift: 1.255, conv: 1.073)
{bombay aloo} -> {pilau rice} (conf: 0.534, supp: 0.112, lift: 1.255, conv: 1

In [15]:
# Print out every rule with a specific dish on the left hand side, sorted by lift
rules_filtered = filter(lambda rule:  rule.lhs == ('garlic naan',), rules)
for rule in sorted(rules_filtered, key=lambda rule: rule.lift, reverse=True):
  print(rule)

{garlic naan} -> {chicken tikka masala} (conf: 0.205, supp: 0.041, lift: 1.156, conv: 1.035)
{garlic naan} -> {plain papadum} (conf: 0.371, supp: 0.074, lift: 1.135, conv: 1.070)
{garlic naan} -> {pilau rice} (conf: 0.464, supp: 0.092, lift: 1.089, conv: 1.070)


In [19]:
# Print out every rule with a specific dishes on the left hand side, sorted by confidence
rules_filtered = filter(lambda rule:  ('garlic naan') in rule.lhs, rules)
for rule in sorted(rules_filtered, key=lambda rule: rule.confidence, reverse=True):
  print(rule)

{garlic naan, mango chutney, onion chutney} -> {plain papadum} (conf: 0.966, supp: 0.011, lift: 2.951, conv: 19.513)
{garlic naan, mango chutney, mini bhaji} -> {plain papadum} (conf: 0.964, supp: 0.011, lift: 2.948, conv: 18.840)
{garlic naan, mango chutney, red sauce} -> {plain papadum} (conf: 0.962, supp: 0.012, lift: 2.942, conv: 17.868)
{garlic naan, mango chutney, mint sauce} -> {plain papadum} (conf: 0.938, supp: 0.012, lift: 2.869, conv: 10.934)
{bombay aloo, garlic naan, mango chutney} -> {plain papadum} (conf: 0.932, supp: 0.011, lift: 2.849, conv: 9.883)
{garlic naan, mango chutney, pilau rice} -> {plain papadum} (conf: 0.912, supp: 0.020, lift: 2.789, conv: 7.674)
{garlic naan, mango chutney} -> {plain papadum} (conf: 0.908, supp: 0.036, lift: 2.776, conv: 7.317)
{garlic naan, onion chutney} -> {plain papadum} (conf: 0.893, supp: 0.015, lift: 2.729, conv: 6.280)
{garlic naan, mint sauce} -> {plain papadum} (conf: 0.862, supp: 0.021, lift: 2.634, conv: 4.861)
{garlic naan, m

In [20]:
# Print out every rule with 1 items on the left hand side and 1 item on the right hand side, sorted by lift
rules_filtered = filter(lambda rule: len(rule.lhs) == 1 and len(rule.rhs) == 1, rules)
for rule in sorted(rules_filtered, key=lambda rule: rule.lift, reverse=True):
  print(rule)

{chicken chaat (main)} -> {bombay aloo} (conf: 0.691, supp: 0.017, lift: 3.291, conv: 2.558)
{onion chutney} -> {mango chutney} (conf: 0.560, supp: 0.052, lift: 3.207, conv: 1.877)
{mango chutney} -> {onion chutney} (conf: 0.296, supp: 0.052, lift: 3.207, conv: 1.289)
{red sauce} -> {onion chutney} (conf: 0.292, supp: 0.022, lift: 3.161, conv: 1.282)
{onion chutney} -> {red sauce} (conf: 0.234, supp: 0.022, lift: 3.161, conv: 1.208)
{lime pickle} -> {mango chutney} (conf: 0.543, supp: 0.011, lift: 3.106, conv: 1.805)
{onion chutney} -> {mint sauce} (conf: 0.387, supp: 0.036, lift: 3.048, conv: 1.425)
{mint sauce} -> {onion chutney} (conf: 0.281, supp: 0.036, lift: 3.048, conv: 1.263)
{keema rice} -> {keema naan} (conf: 0.377, supp: 0.011, lift: 3.007, conv: 1.403)
{red sauce} -> {mint sauce} (conf: 0.372, supp: 0.028, lift: 2.930, conv: 1.391)
{mint sauce} -> {red sauce} (conf: 0.217, supp: 0.028, lift: 2.930, conv: 1.182)
{red sauce} -> {mango chutney} (conf: 0.509, supp: 0.038, lift: