**Data Preprocessing**

In [None]:
import pandas as pd
df = pd.read_excel('Online retail.xlsx', header=None, engine='openpyxl')
df.columns = ['Items']

In [None]:
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)

In [None]:
transactions = df['Items'].apply(lambda x: [item.strip() for item in str(x).split(',')])

In [None]:
transaction_list = transactions.tolist()

In [None]:
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(transaction_list).transform(transaction_list)
basket = pd.DataFrame(te_ary, columns=te.columns_)
print(basket.head())

   almonds  antioxydant juice  asparagus  avocado  babies food  bacon  \
0     True               True      False     True        False  False   
1    False              False      False    False        False  False   
2    False              False      False    False        False  False   
3    False              False      False     True        False  False   
4    False              False      False    False        False  False   

   barbecue sauce  black tea  blueberries  body spray  ...  turkey  \
0           False      False        False       False  ...   False   
1           False      False        False       False  ...   False   
2           False      False        False       False  ...   False   
3           False      False        False       False  ...    True   
4           False      False        False       False  ...   False   

   vegetables mix  water spray  white wine  whole weat flour  \
0            True        False       False              True   
1           

**Association Rule Mining**

In [None]:
from mlxtend.frequent_patterns import apriori, association_rules
frequent_itemsets = apriori(basket, min_support=0.05, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)
rules = rules[rules['confidence'] >= 0.4]
rules[['antecedents','consequents','support','confidence','lift']].sort_values(by='lift', ascending=False).head(10)

Unnamed: 0,antecedents,consequents,support,confidence,lift
13,(ground beef),(spaghetti),0.055835,0.411095,1.791102
11,(ground beef),(mineral water),0.058733,0.432432,1.442184


**Analysis and Interpretation**

Let's analyze the **two association rules** you generated from the dataset to identify interesting and meaningful product relationships.

**RULE 1: ground beef -> spaghetti**

**Support:** 5.6% - indicates this combo occurs in a fair number of transaction.

**Confidence:**41.1% - strong chance that spaghetti is purchased when ground beef is bought.

**Lift:**1.79% - shows this pairing happens much more frequently together than by random chance.

*   Insights: Common meal combo.
*   Use: Bundle or recommend together.

**RULE 2: ground beef -> mineral water**

**Support:**5.9% - slightly higher than the previous rule.

**Confidence:**43.2% - very likely that mineral water is purchased with ground beef.

**Lift:**1.44% - still significantly higher than 1, meaning a non-random association.

*   Insight: Suggest health-conscious buying.
*   Use: Upsell or promote as a balanced meal combo.

**Pattern:** Customers buy items that go together(meals or healthy combos).Use for smart promotions and product placement.


**Interview Questions**


1. What is lift and why is it important in Association rules?  

**Answer:** Lift measures how much more likely two items are to be bought together compared to them being bought independently.

Lift(A=>B) = Confidence(A=>B) / Suppor(B)



**Lift > 1 - Positive association:** Items are bought together more often than by chance.
**Lift = 1 - No association:** Buying A doesn't affect buying B.  
**Lift < 1 - Negative association:** Items are bought together less often than expected.


2. What is support and Confidence. How do you calculate them?

**Answer:** **Support** refers to the relative frequency of an item set in dataset.

Support(A=>B) = Transactions containing both A and B / Total transaction

**Confidence** tells you how oftenitem B is purchased when item A is purchased.

Confidence(A=>B) = Support(AUB) /
                   Support(A)

3. 3.	What are some limitations or challenges of Association rules mining?

**Answer:**

*   Generates a large number of rules, hard to identify actionable insights without proper filtering.

*   In large datasets, most transactions contain few overlapping items, making it hard to find strong rules.

*   Choosing the right support and confidence thresholds is tricky.
1. Too low - Too many noisy rules
2. Too high - Miss interesting but rare patterns





