In [3]:

import pandas as pd

# Load dataset
df = pd.read_excel("Online retail.xlsx")

# Rename column
df.columns = ["items"]

# Split items into list
transactions = df["items"].apply(lambda x: x.split(","))

# Convert to list of lists
transactions_list = transactions.tolist()


In [5]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Convert to one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions_list).transform(transactions_list)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

# Apply Apriori
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

rules.sort_values("confidence", ascending=False).head(10)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
346,"(eggs, ground beef)",(mineral water),0.02,0.238267,0.010133,0.506667,2.126469,1.0,0.005368,1.544054,0.540548,0.040838,0.352354,0.274598
380,"(milk, ground beef)",(mineral water),0.022,0.238267,0.011067,0.50303,2.111207,1.0,0.005825,1.532756,0.538177,0.044409,0.34758,0.274738
321,"(chocolate, ground beef)",(mineral water),0.023067,0.238267,0.010933,0.473988,1.989319,1.0,0.005437,1.44813,0.509058,0.043663,0.309454,0.259938
367,"(frozen vegetables, milk)",(mineral water),0.0236,0.238267,0.011067,0.468927,1.968075,1.0,0.005444,1.434328,0.503778,0.044125,0.302809,0.257687
275,(soup),(mineral water),0.050533,0.238267,0.023067,0.456464,1.915771,1.0,0.011026,1.401441,0.503458,0.086804,0.286449,0.276637
404,"(pancakes, spaghetti)",(mineral water),0.0252,0.238267,0.011467,0.455026,1.909736,1.0,0.005462,1.397744,0.488682,0.045503,0.284561,0.251576
398,"(olive oil, spaghetti)",(mineral water),0.022933,0.238267,0.010267,0.447674,1.87888,1.0,0.004802,1.379138,0.478747,0.040914,0.27491,0.245382
392,"(spaghetti, milk)",(mineral water),0.035467,0.238267,0.015733,0.443609,1.861817,1.0,0.007283,1.369061,0.479911,0.060982,0.269572,0.254821
327,"(chocolate, milk)",(mineral water),0.032133,0.238267,0.014,0.435685,1.828559,1.0,0.006344,1.349836,0.468165,0.054602,0.259169,0.247221
386,"(spaghetti, ground beef)",(mineral water),0.0392,0.238267,0.017067,0.435374,1.827256,1.0,0.007727,1.349094,0.471202,0.06554,0.258762,0.253501


## Analysis and Interpretation of Association Rules

These rules give insights into how customers group products during their purchases.

### 1. Overall Observations
- Rules with **high lift (>1)** indicate strong positive associations between items, meaning they occur together more often than random chance.
- Rules with **high confidence** indicate reliable behaviour — when a customer buys the antecedent items, they are likely to buy the consequent items as well.
- High-support rules represent **popular combinations** in customer baskets.

### 2. Interpretation of Rules

- **Rule Example:** `{Item A} -> {Item B}`  
  - *Meaning:* Customers who buy Item A often also buy Item B.  
  - *Insight:* These items complement each other. Consider placing them together or promoting them as a bundle.

- **Rule Example:** `{Item X, Item Y} -> {Item Z}`  
  - *Meaning:* When customers purchase X and Y together, they commonly add Z.  
  - *Insight:* Suggest Z during checkout or offer a combo deal.

- **Rule Example:** `{Item M} -> {Item N}` with high lift  
  - *Meaning:* Item M strongly drives the purchase of Item N.  
  - *Insight:* Item M can be used for targeted recommendations or cross-selling.

### 3. Customer Purchasing Behaviour
- Customers tend to buy **complementary items**, which opens up opportunities for combo offers.
- Some products appear repeatedly in strong rules, indicating **key driver products**.
- High-confidence rules reveal **predictable buying behavior**, useful for recommendation systems.

### 4. Business Recommendations
- Place strongly associated items **next to each other** in the store or online.
- Create **bundle discounts** for frequently co-purchased products.
- Use the rules to build **recommendation systems** that suggest items during checkout.
- Keep high-support item combinations **consistently stocked**.



# Interview Questions & Answers

**1. What is lift and why is it important in Association rules?**  
Lift = confidence(X → Y) / support(Y). Lift > 1 means X and Y occur together more often than expected by chance. It highlights truly meaningful associations beyond random co-occurrence.

**2. What is support and confidence? How do you calculate them?**  
- Support = count(X ∪ Y) / total transactions → how frequently an itemset appears.  
- Confidence = count(X ∪ Y) / count(X) → probability of Y being purchased when X is purchased.

**3. What are some limitations of Association Rules Mining?**  
- Requires careful threshold tuning (support/confidence/lift).
- Sparsity in large item universes reduces meaningful rules.
- Does not imply causation — only association.
- For continuous data, requires binning or discretization.
