In [29]:

#!pip install mlxtend

# Step 2: Import libraries
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Step 3: Load the dataset
df = pd.read_excel("Online retail.xlsx")

# Step 4: Convert single column of transactions to list of lists
transactions = df.iloc[:, 0].dropna().apply(lambda x: x.split(','))

# Step 5: One-hot encode the transactions
te = TransactionEncoder()
te_array = te.fit_transform(transactions)
df_encoded = pd.DataFrame(te_array, columns=te.columns_)

# Step 6: Apply Apriori algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.02, use_colnames=True)

# Step 7: Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

# Step 8: Filter rules with high confidence and lift
filtered_rules = rules[(rules['confidence'] >= 0.3) & (rules['lift'] >= 1.5)]

# Step 9: Display top 10 rules
filtered_rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]\
    .sort_values(by='lift', ascending=False).head(10)


Unnamed: 0,antecedents,consequents,support,confidence,lift
66,(ground beef),(spaghetti),0.0392,0.398915,2.290857
89,(olive oil),(spaghetti),0.022933,0.348884,2.003547
81,(soup),(mineral water),0.023067,0.456464,1.915771
1,(burgers),(eggs),0.0288,0.330275,1.837585
94,(tomatoes),(spaghetti),0.020933,0.306043,1.75752
75,(olive oil),(mineral water),0.027467,0.41785,1.753707
65,(ground beef),(mineral water),0.040933,0.416554,1.748266
29,(cooking oil),(mineral water),0.020133,0.394256,1.654683
10,(chicken),(mineral water),0.0228,0.38,1.594852
56,(frozen vegetables),(mineral water),0.035733,0.374825,1.573133


In [None]:
1. What is Lift and why is it important in Association Rules?
Lift measures how much more likely two items are to be bought together compared to if they were statistically independent.
Lift helps us filter out misleading rules. A high-confidence rule may still be common and not useful — lift checks if the rule is actually interesting or better than chance.

2. What is Support and Confidence? How do you calculate them?
Support:
Support indicates how frequently an itemset appears in the dataset.
Used to eliminate rare itemsets and reduce computation by focusing on frequently occurring patterns.
Confidence:
Confidence indicates how often item B appears in transactions that contain A.
Shows the strength of implication — how reliably A leads to B.

3. What are some limitations or challenges of Association Rules Mining?
Too many rules: Can overwhelm users if thresholds are too low.
No causation: Association doesn’t imply one item causes the purchase of another.
Rare item problem: Important but rare items may be filtered out by support.
High computation cost: Generating rules from large datasets is time-consuming.
Threshold tuning: Choosing support, confidence, and lift values is subjective and domain-specific.
Interpretability: As datasets grow, rules become harder to interpret and apply.
