1. Loading dataset 

In [11]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Load the dataset
df = pd.read_excel("Online retail.xlsx", header=None)

Data Preprocessing

In [13]:
transactions = [str(row[0]).split(',') for row in df.itertuples(index=False)]

#One-hot encode the transactions
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

Generate frequent itemsets using Apriori

In [14]:
frequent_itemsets = apriori(df_encoded, min_support=0.03, use_colnames=True)

Generate association rules

In [15]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules = rules.sort_values(['lift', 'confidence'], ascending=[False, False])

Display the top rules

In [16]:
print("Top Association Rules:")
print(rules[['antecedents','consequents','support','confidence','lift']])

Top Association Rules:
            antecedents          consequents   support  confidence      lift
15          (spaghetti)        (ground beef)  0.039195    0.225115  2.291162
14        (ground beef)          (spaghetti)  0.039195    0.398915  2.291162
12      (mineral water)        (ground beef)  0.040928    0.171700  1.747522
13        (ground beef)      (mineral water)  0.040928    0.416554  1.747522
11  (frozen vegetables)      (mineral water)  0.035729    0.374825  1.572463
10      (mineral water)  (frozen vegetables)  0.035729    0.149888  1.572463
19          (spaghetti)               (milk)  0.035462    0.203675  1.571779
18               (milk)          (spaghetti)  0.035462    0.273663  1.571779
16               (milk)      (mineral water)  0.047994    0.370370  1.553774
17      (mineral water)               (milk)  0.047994    0.201342  1.553774
2                (milk)          (chocolate)  0.032129    0.247942  1.513276
3           (chocolate)               (milk)  0.03212

**Analysis and Interpretation**

**1. Observations from the rules**

*a) Spaghetti and Ground Beef*

(spaghetti) → (ground beef) confidence = 0.225, lift = 2.29

(ground beef) → (spaghetti) confidence = 0.398, lift = 2.29

Interpretation:

Customers buying ground beef often also buy spaghetti (~40% of the time).

The lift > 2 indicates a strong positive association: spaghetti and ground beef are frequently bought together more than by chance.

Suggests a meal combination pattern, like making pasta dishes.

*b) Mineral Water associations*

(mineral water) → (ground beef) lift = 1.75

(frozen vegetables) → (mineral water) lift = 1.57

(milk) → (mineral water) lift = 1.55

(pancakes) → (mineral water) lift = 1.48

Interpretation:

Mineral water is a frequently co-purchased item with many products.

Lift > 1 indicates customers tend to buy water alongside other groceries, possibly for health-conscious purchases or because it’s a staple.

*c) Milk associations*

(milk) → (chocolate) lift = 1.51

(milk) → (eggs) lift = 1.32

Interpretation:

Milk is commonly bought with chocolate and eggs.

This could indicate breakfast-related purchases or baking ingredients.

*d) Chocolate associations*

(chocolate) → (milk) lift = 1.51

(chocolate) → (french fries) lift = 1.23

(chocolate) → (mineral water) lift = 1.35

Interpretation:

Chocolate is bought alongside both drinks (milk, water) and snack items (french fries).

Customers often combine treats with staple beverages.

**2. Key Patterns**

Meal-oriented combinations:

Spaghetti + Ground Beef

Frozen vegetables + Mineral Water

Pancakes + Mineral Water

Breakfast/baking patterns:

Milk + Chocolate

Milk + Eggs

Staple items appear frequently:

Mineral water shows up in multiple rules, indicating it’s commonly purchased with a wide variety of products.

Snacks & treats co-purchases:

Chocolate and French Fries, Chocolate and Milk – customers often buy indulgent items together.

**3. Insights for customer behavior**

Meal planning: Customers buy ingredients together for specific meals (pasta, baking, breakfast). Retailers can bundle such items or recommend them online.

Promotions: Items with high lift (like spaghetti + ground beef) are excellent candidates for cross-selling or combo discounts.

Health and staples: Mineral water is frequently paired with other products – a key staple for health-conscious shoppers.

Product placement: Chocolate and milk frequently bought together suggests placing them near each other in-store or promoting together online.

**4. Recommendations for retail/business**

Bundle offers: Create combo offers for high-lift product pairs, e.g., spaghetti + ground beef.

Cross-selling: Suggest associated items at checkout (e.g., milk + chocolate).

Store layout: Place frequently co-purchased items together to increase convenience and sales.

Marketing campaigns: Highlight meal ideas or breakfast packs using observed associations.

Interview Questions

**1.	What is lift and why is it important in Association rules?**

Lift is a measure in association rule mining that indicates how much more likely the consequent (Y) is purchased when the antecedent (X) is purchased, compared to the case when Y is purchased independently of X.

Importance:

Helps identify strong and meaningful associations beyond simple frequency.

Indicates whether the presence of X actually increases the likelihood of Y, which is critical for marketing strategies, product placement, or recommendation systems.

Avoids misleading rules that may appear confident but occur due to high baseline frequency of Y.

**2.	What is support and Confidence. How do you calculate them?**

Support measures how frequently an itemset occurs in the dataset. It reflects the overall prevalence of an item or combination of items.

Confidence measures the likelihood of occurrence of the consequent (Y) given that the antecedent (X) has occurred. It reflects the strength of the rule.

Support(X→Y)=Total number of transactions/Number of transactions containing both X and Y​

Confidence(X→Y)=Support of X/Support of (X and Y)​


**3.	What are some limitations or challenges of Association rules mining?**

Large Number of Rules:

Mining large datasets can generate thousands or millions of rules, many of which may be redundant or insignificant.

High Computational Cost:

Generating frequent itemsets and calculating support, confidence, and lift for all possible combinations can be computationally expensive, especially for big datasets.

Sparsity of Data:

In datasets with many items but few transactions per item, it’s difficult to find meaningful associations.

Choice of Thresholds:

The quality of rules depends heavily on minimum support and confidence thresholds. Too high may miss useful rules; too low may produce many irrelevant rules.

Ignores Sequential Information:

Classic association rules do not capture the order of transactions; temporal or sequential patterns are not considered.

Correlation vs. Causation:

Association rules only indicate correlation, not causation; just because items appear together does not mean one causes the other.

Handling Continuous Data:

Association rule mining works best with categorical data. Continuous/numeric data needs to be discretized, which may lose information.