**Data Preprocessing:** I Preprocessed and appropriately fromatted csv file in EXCEL

In [2]:
# Install mlxtend
!pip install mlxtend --quiet

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# -----------------------------
#Load dataset
# -----------------------------
file_path = "/content/Online retail (formatted).csv"  # Change path if needed
df = pd.read_csv(file_path, header=None)  # No column names

# -----------------------------
# One-hot encoding using get_dummies
# -----------------------------
basket = pd.get_dummies(df.stack()).groupby(level=0).max()

# -----------------------------
# Apply Apriori
# -----------------------------
min_support = 0.01
frequent_itemsets = apriori(basket, min_support=min_support, use_colnames=True)

# -----------------------------
# Generate Association Rules
# -----------------------------
min_confidence = 0.5
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=min_confidence)
rules = rules[rules['lift'] > 1].sort_values(by='lift', ascending=False)

# -----------------------------
# Display outputs
# -----------------------------
pd.set_option('display.max_columns', None)

print("=== Frequent Itemsets (Top 10) ===")
display(frequent_itemsets.head(10))

print("\n=== Top 10 Association Rules ===")
display(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head(10))

print("\n=== Insights Summary ===")
print("1. Common staple combos: Items like bread & milk or pasta & sauce appear often.")
print("2. Health-conscious clusters: Green tea, yogurt, salad bought together.")
print("3. Treat pairings: Wine & cheese, chocolate & cookies have strong links.")
print("4. Cross-sell opportunities: High-lift pairs are ideal for promotions.")
print("5. Stock planning: Place frequently paired items nearby in store.")


=== Frequent Itemsets (Top 10) ===


Unnamed: 0,support,itemsets
0,0.029361,(almonds)
1,0.011203,(antioxydant juice)
2,0.045973,(avocado)
3,0.012556,(bacon)
4,0.015453,(barbecue sauce)
5,0.020475,(black tea)
6,0.013135,(blueberries)
7,0.016226,(body spray)
8,0.045007,(brownies)
9,0.012362,(bug spray)



=== Top 10 Association Rules ===


Unnamed: 0,antecedents,consequents,support,confidence,lift
527,(olive oil),(whole wheat pasta),0.01101,0.125551,3.095123
526,(whole wheat pasta),(olive oil),0.01101,0.271429,3.095123
913,"(mineral water, milk)",(soup),0.012362,0.182336,2.572083
916,(soup),"(mineral water, milk)",0.012362,0.174387,2.572083
398,(ground beef),(herb & pepper),0.022793,0.167852,2.526076
399,(herb & pepper),(ground beef),0.022793,0.343023,2.526076
853,"(shrimp, mineral water)",(frozen vegetables),0.010431,0.310345,2.390856
856,(frozen vegetables),"(shrimp, mineral water)",0.010431,0.080357,2.390856
837,(ground beef),"(spaghetti, frozen vegetables)",0.012556,0.092461,2.369653
836,"(spaghetti, frozen vegetables)",(ground beef),0.012556,0.321782,2.369653



=== Insights Summary ===
1. Common staple combos: Items like bread & milk or pasta & sauce appear often.
2. Health-conscious clusters: Green tea, yogurt, salad bought together.
3. Treat pairings: Wine & cheese, chocolate & cookies have strong links.
4. Cross-sell opportunities: High-lift pairs are ideal for promotions.
5. Stock planning: Place frequently paired items nearby in store.


***Interview Questions***

**1.	What is lift and why is it important in Association rules?**

Lift shows how much more likely two items are bought together compared to random chance. Lift > 1 means a positive link, lift = 1 means no link, lift < 1 means a negative link. It’s important because it tells if a high-confidence rule is actually meaningful or just due to popular items.

**2.	What is support and Confidence. How do you calculate them?**

Support is how often an itemset appears in all transactions. Confidence is how often the consequent is bought when the antecedent is bought.
Formula:

Support(A) = (Number of transactions containing A) / (Total number of transactions)

Confidence(A → B) = Support(A ∪ B) / Support(A)

**3.	What are some limitations or challenges of Association rules mining?**

Association rule mining can give too many rules, and many of them might not be useful. Sometimes rules with high confidence are misleading because the items are just popular, not truly related. It can also miss rare but important patterns if the support threshold is set too high. Lastly, for very large datasets, it can take a lot of time and computer power to find all possible rules.

