# Simulate Transaction Data

## Objective
- Simulate 10 fake transactions, each containing 2–5 items, selected randomly from a pool of at least 8 unique grocery items.
- This simulates shopping basket data commonly used in market basket analysis.



In [1]:
import random
import pandas as pd

# Define 8 unique grocery items
items = ['Bread', 'Milk', 'Eggs', 'Cheese', 'Butter', 'Apples', 'Bananas', 'Cereal']

# Simulate 10 transactions with 2 to 5 random items each
transactions = []
for i in range(10):
    transaction = random.sample(items, random.randint(2, 5))
    transactions.append(transaction)

# Display the transactions
for idx, t in enumerate(transactions, 1):
    print(f"Transaction {idx}: {t}")


Transaction 1: ['Cereal', 'Eggs']
Transaction 2: ['Bread', 'Milk']
Transaction 3: ['Cereal', 'Milk', 'Butter', 'Apples']
Transaction 4: ['Cereal', 'Eggs', 'Bananas', 'Bread', 'Cheese']
Transaction 5: ['Apples', 'Cereal', 'Butter', 'Bananas']
Transaction 6: ['Eggs', 'Bananas', 'Apples', 'Cereal']
Transaction 7: ['Milk', 'Cheese', 'Eggs', 'Apples']
Transaction 8: ['Bread', 'Apples']
Transaction 9: ['Eggs', 'Cheese', 'Milk', 'Apples']
Transaction 10: ['Apples', 'Cheese']


###  Explanation:
- We created a pool of 8 grocery items.
- Each transaction consists of 2–5 randomly chosen items using `random.sample`.
- We generated 10 such transactions and printed them for review.

This forms the base dataset for applying association rule mining using the Apriori algorithm in the next steps.


## One-hot Encoding and Apriori Analysis (4 Marks)

### Objective:
Transform the simulated transactions into a format suitable for association rule mining, and use the Apriori algorithm to find frequent itemsets with a minimum support threshold of 0.3 (30%).


In [3]:
from mlxtend.preprocessing import TransactionEncoder

# Initialize and fit the TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)

# Convert to a DataFrame
df = pd.DataFrame(te_ary, columns=te.columns_)

# Display the one-hot encoded transaction matrix
df.head()


Unnamed: 0,Apples,Bananas,Bread,Butter,Cereal,Cheese,Eggs,Milk
0,False,False,False,False,True,False,True,False
1,False,False,True,False,False,False,False,True
2,True,False,False,True,True,False,False,True
3,False,True,True,False,True,True,True,False
4,True,True,False,True,True,False,False,False


In [5]:
from mlxtend.frequent_patterns import apriori

# Apply the Apriori algorithm with min support of 0.3
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

# Display the frequent itemsets
frequent_itemsets


Unnamed: 0,support,itemsets
0,0.7,(Apples)
1,0.3,(Bananas)
2,0.3,(Bread)
3,0.5,(Cereal)
4,0.4,(Cheese)
5,0.5,(Eggs)
6,0.4,(Milk)
7,0.3,"(Cereal, Apples)"
8,0.3,"(Cheese, Apples)"
9,0.3,"(Eggs, Apples)"


###  Explanation:
- We used `TransactionEncoder` to convert our transaction list into a one-hot encoded matrix suitable for analysis.
- The Apriori algorithm from `mlxtend` was used to discover itemsets that appear in **at least 30% of the transactions**.
- These frequent itemsets will help us generate association rules in the next step.


## Generate Association Rules 

### Objective:
Using the frequent itemsets discovered in Step 3, we now generate association rules. We are particularly interested in rules with **confidence ≥ 0.7**, meaning they are reliable indicators of co-purchase behavior.


In [6]:
from mlxtend.frequent_patterns import association_rules

# Generate rules using confidence as the metric
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

# Display selected rule columns
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Cheese),(Apples),0.3,0.75,1.071429
1,(Milk),(Apples),0.3,0.75,1.071429
2,(Bananas),(Cereal),0.3,1.0,2.0
3,(Cheese),(Eggs),0.3,0.75,1.5


### ✅ Interpretation of Association Rules

Here are explanations for each rule based on the output:

---

🔹 **Rule 1:**  
**If a customer buys Cheese → they also buy Apples**  
- Support: 0.30  
- Confidence: 0.75  
- Lift: 1.07  
> 📘 **Meaning:** 75% of the time customers who buy Cheese also buy Apples. The lift is slightly above 1, suggesting a weak but positive association.

---

🔹 **Rule 2:**  
**If a customer buys Milk → they also buy Apples**  
- Support: 0.30  
- Confidence: 0.75  
- Lift: 1.07  
> 📘 **Meaning:** Similar to Rule 1, this implies that Milk and Apples are often purchased together. Confidence is strong (75%), but lift is low, so the relationship isn't particularly strong compared to chance.

---

🔹 **Rule 3:**  
**If a customer buys Bananas → they also buy Cereal**  
- Support: 0.30  
- Confidence: 1.00  
- Lift: 2.00  
> 🚀 **Meaning:** Every customer who bought Bananas also bought Cereal. The lift of 2.0 means this rule is twice as likely as random chance — a strong and valuable pattern for promotions or bundling.

---

🔹 **Rule 4:**  
**If a customer buys Cheese → they also buy Eggs**  
- Support: 0.30  
- Confidence: 0.75  
- Lift: 1.5  
> 📘 **Meaning:** 75% of Cheese buyers also buy Eggs, and the lift of 1.5 suggests a moderate association. These items could be placed near each other in a store layout.

---

> ✅ These rules show real shopping patterns that a store or online system could use for:
> - Product placement
> - Discount bundles
> - Recommender systems
