#  Association Rules Assignment

**Objective:** Perform market basket analysis using Apriori algorithm to discover relationships in customer transactions.

# Data Preprocessing

In [1]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Load dataset
file_path = r"C:/Users/prern/OneDrive/Desktop/New folder/Association Rules/Association Rules/Online retail.xlsx"  # User's original dataset path
df = pd.read_excel(file_path)

# Process transactions
transactions = df.iloc[:, 0].apply(lambda x: list(set(x.strip().split(','))))

# Convert to one-hot encoded format
te = TransactionEncoder()
te_ary = te.fit_transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)
df_encoded.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


# Association Rule Mining with Apriori

In [2]:
from mlxtend.frequent_patterns import apriori, association_rules

# Apply Apriori algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.02, use_colnames=True)

# Generate rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
rules = rules.sort_values(by='lift', ascending=False)
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head()

Unnamed: 0,antecedents,consequents,support,confidence,lift
47,(spaghetti),(ground beef),0.0392,0.225115,2.290857
46,(ground beef),(spaghetti),0.0392,0.398915,2.290857
69,(spaghetti),(olive oil),0.022933,0.1317,2.003547
68,(olive oil),(spaghetti),0.022933,0.348884,2.003547
60,(soup),(mineral water),0.023067,0.456464,1.915771


# Analysis and Interpretation

- **Lift > 1** indicates strong association.
- For instance, if `mineral water` → `green tea` has high lift, they’re likely bought together.
- These insights can inform store layout and promotions.


# Conclusion

In this assignment, we performed market basket analysis on an online retail dataset using the Apriori algorithm. We:

- Preprocessed the dataset to ensure quality and structure.
- Applied the Apriori algorithm to identify frequent itemsets.
- Generated association rules to uncover relationships between purchased products.
- Analyzed the rules using metrics such as support, confidence, and lift.

This analysis provided valuable insights into customer buying behavior. It helps businesses make informed decisions regarding product placement, bundling, and targeted marketing to improve sales and customer satisfaction.


##  Interview Questions and Answers

---

###  1. What is **Lift** and why is it important in Association Rules?

**Answer:**  
Lift measures how much more likely two items are to be bought together compared to being bought independently. It is calculated as:

\[
\text{Lift} = \frac{\text{Confidence}(A → B)}{\text{Support}(B)}
\]

- **Lift > 1**: A and B appear together more often than expected (positive correlation).
- **Lift = 1**: No correlation between A and B.
- **Lift < 1**: A and B appear together less often than expected (negative correlation).

**Why it's important:**  
Lift helps identify **meaningful and interesting associations** that are not due to random chance.

---

###  2. What is **Support** and **Confidence**? How do you calculate them?

**Answer:**

- **Support** measures how frequently an itemset appears in the dataset.

\[
\text{Support}(A) = \frac{\text{Number of transactions containing A}}{\text{Total number of transactions}}
\]

- **Confidence** measures how often item B appears in transactions that contain item A.

\[
\text{Confidence}(A → B) = \frac{\text{Support}(A ∪ B)}{\text{Support}(A)}
\]

These metrics help in filtering the most frequent and reliable rules.

---

###  3. What are some **limitations or challenges** of Association Rule Mining?

**Answer:**

- **Large number of rules**: Can generate too many rules, making it hard to interpret.
- **Sparse data**: Market baskets are usually sparse, so many rules might be trivial or irrelevant.
- **Support-Confidence trade-off**: High support rules might be obvious; low support rules might not be reliable.
- **No time consideration**: It does not account for the order or time of purchases (solved by sequential pattern mining).
- **Binary limitation**: Requires one-hot encoded data, losing quantity information (e.g., how many items bought).

---

 These questions help demonstrate conceptual understanding in interviews related to data mining, analytics, or data science roles.
