### Step 1: Simulate Transaction Data (with comments)
Here we create a dataset of 10 transactions where some items frequently appear together (to ensure association rules can be discovered later).

In [43]:
import pandas as pd          # Import pandas for data handling
import random                # Import random for simulating transactions

# Define a pool of 10 unique items that customers can buy
items_pool = ['Bread', 'Milk', 'Eggs', 'Butter', 'Cheese',
              'Juice', 'Apples', 'Bananas', 'Tomatoes', 'Chicken']

# Create 10 transactions manually with overlapping items
# For example, 'Bread' and 'Milk' are together in several transactions
transactions = [
    ['Bread', 'Milk', 'Eggs'],            # Bread + Milk + Eggs
    ['Bread', 'Milk'],                     # Bread + Milk again
    ['Bread', 'Milk', 'Butter'],           # Bread + Milk + Butter
    ['Eggs', 'Cheese', 'Butter'],          # Eggs + Cheese
    ['Bread', 'Eggs', 'Juice'],            # Bread + Eggs
    ['Milk', 'Cheese'],                    # Milk + Cheese
    ['Bread', 'Milk', 'Eggs'],             # Bread + Milk + Eggs again
    ['Juice', 'Apples'],                   # Juice + Apples
    ['Bread', 'Milk'],                     # Bread + Milk again
    ['Eggs', 'Butter']                     # Eggs + Butter
]

# Display all simulated transactions
print("=== Simulated Transactions ===")
for idx, t in enumerate(transactions, 1):
    print(f"Transaction {idx}: {t}")

=== Simulated Transactions ===
Transaction 1: ['Bread', 'Milk', 'Eggs']
Transaction 2: ['Bread', 'Milk']
Transaction 3: ['Bread', 'Milk', 'Butter']
Transaction 4: ['Eggs', 'Cheese', 'Butter']
Transaction 5: ['Bread', 'Eggs', 'Juice']
Transaction 6: ['Milk', 'Cheese']
Transaction 7: ['Bread', 'Milk', 'Eggs']
Transaction 8: ['Juice', 'Apples']
Transaction 9: ['Bread', 'Milk']
Transaction 10: ['Eggs', 'Butter']


### Convert transactions into a one-hot encoded DataFrame
This converts the list of transactions into a one-hot encoded table, where:
Rows = transactions
Columns = items
Value = 1 if the item is in the transaction, 0 otherwise

In [44]:
# Create a list to hold one-hot encoded rows
encoded_rows = []

# Loop through each transaction
for transaction in transactions:
    # For each transaction, create a dictionary where:
    # key = item name, value = True (if item is in transaction), False otherwise
    row = {item: (item in transaction) for item in items_pool}
    encoded_rows.append(row)  # Add this row to the list

# Convert the list of dictionaries into a pandas DataFrame
encoded_data = pd.DataFrame(encoded_rows)

# Display the one-hot encoded DataFrame
print("\n=== One-Hot Encoded Transaction Data ===")
print(encoded_data)



=== One-Hot Encoded Transaction Data ===
   Bread   Milk   Eggs  Butter  Cheese  Juice  Apples  Bananas  Tomatoes  \
0   True   True   True   False   False  False   False    False     False   
1   True   True  False   False   False  False   False    False     False   
2   True   True  False    True   False  False   False    False     False   
3  False  False   True    True    True  False   False    False     False   
4   True  False   True   False   False   True   False    False     False   
5  False   True  False   False    True  False   False    False     False   
6   True   True   True   False   False  False   False    False     False   
7  False  False  False   False   False   True    True    False     False   
8   True   True  False   False   False  False   False    False     False   
9  False  False   True    True   False  False   False    False     False   

   Chicken  
0    False  
1    False  
2    False  
3    False  
4    False  
5    False  
6    False  
7    False  
8   

###  Step 2: Analyze with Apriori Algorithm
Here we use the Apriori algorithm to find frequent itemsets (items or item combinations that occur in at least 30% of the transactions).

In [45]:

from mlxtend.frequent_patterns import apriori

# Apply the Apriori algorithm to find frequent itemsets
frequent_itemsets = apriori(
    encoded_data,          # The one-hot encoded data
    min_support=0.3,       # Minimum support threshold (30%)
    use_colnames=True      # Show item names instead of column indices
)

# Display frequent itemsets
print("\n Frequent Itemsets (Support >= 30%) ")
print(frequent_itemsets)



 Frequent Itemsets (Support >= 30%) 
   support       itemsets
0      0.6        (Bread)
1      0.6         (Milk)
2      0.5         (Eggs)
3      0.3       (Butter)
4      0.5  (Milk, Bread)
5      0.3  (Eggs, Bread)


### Step 3: Generate Association Rules
Now we generate association rules from the frequent itemsets, keeping only those with confidence ≥ 70%.

In [None]:

#  Apriori Algorithm (min_support = 0.3)
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)
from mlxtend.frequent_patterns import association_rules  # Import association rules generator

# Generate association rules with confidence >= 70%
rules = association_rules(
    frequent_itemsets,      # Input: frequent itemsets from Apriori
    metric="confidence",    # Use confidence as the metric
    min_threshold=0.7       # Minimum confidence threshold: 70%
)

# Select important columns for display
rules_summary = rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

# Display the top 2 rules
print("\n=== Top 2 Association Rules (Confidence >= 70%) ===")
print(rules_summary.head(2))

# Generate Rules (confidence ≥ 0.7)
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)
# Show Results
print("Frequent Itemsets:\n")
print(frequent_itemsets)
print("\nAssociation Rules:\n", rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])


=== Top 2 Association Rules (Confidence >= 70%) ===
  antecedents consequents  support  confidence      lift
0      (Milk)     (Bread)      0.5    0.833333  1.388889
1     (Bread)      (Milk)      0.5    0.833333  1.388889
Frequent Itemsets:

   support       itemsets
0      0.6        (Bread)
1      0.3       (Butter)
2      0.5         (Eggs)
3      0.6         (Milk)
4      0.3  (Eggs, Bread)
5      0.5  (Milk, Bread)

Association Rules:
   antecedents consequents  support  confidence      lift
0      (Milk)     (Bread)      0.5    0.833333  1.388889
1     (Bread)      (Milk)      0.5    0.833333  1.388889



##  Explanation of the Association Rule Mining Process

This process generates association rules from the frequent itemsets mined earlier. Each rule connects an **antecedent** (the "if" part) with a **consequent** (the "then" part), showing how likely they are to occur together.

---

###  Rule 1: *Milk → Bread*

**Support**
The support of this rule is **0.5**, meaning Milk and Bread appear together in **50% of all transactions**.

**Confidence**
Confidence measures how often the rule has been found to be true:

$$
\text{Confidence} = \frac{\text{Support(Milk ∩ Bread)}}{\text{Support(Milk)}}
$$

Here:

* Support(Milk ∩ Bread) = 0.5 (Milk and Bread together in 50% of transactions)
* Support(Milk) = 0.6 (Milk alone in 60% of transactions)

$$
\text{Confidence} = \frac{0.5}{0.6} = 0.83 \ (83\%)
$$

**Interpretation:**
When Milk is purchased, Bread is also purchased **83% of the time**. This indicates a strong tendency for these items to be bought together.

**Lift**
Lift measures how much more likely Bread is to be bought when Milk is bought, compared to Bread being bought at random:

$$
\text{Lift} = 1.39
$$

Since Lift > 1, this is a **positive association**, meaning Milk and Bread are frequently bought together more than expected by chance.

---

###  Rule 2: *Bread → Milk*

**Support**
The support of this rule is also **0.5**, meaning Bread and Milk appear together in **50% of transactions**.

**Confidence**

$$
\text{Confidence} = \frac{\text{Support(Bread ∩ Milk)}}{\text{Support(Bread)}}
$$

Here:

* Support(Bread ∩ Milk) = 0.5 (Bread and Milk together in 50% of transactions)
* Support(Bread) = 0.6 (Bread alone in 60% of transactions)

$$
\text{Confidence} = \frac{0.5}{0.6} = 0.83 \ (83\%)
$$

**Interpretation:**
When Bread is purchased, Milk is also purchased **83% of the time**.

**Lift**
As before:

$$
\text{Lift} = 1.39
$$

This means customers who buy Bread are **1.39 times more likely** to also buy Milk compared to random chance.

---

###  Why Use min\_threshold=0.7?

Setting **min\_threshold=0.7** ensures that only the **strongest rules** are kept — those that are true at least **70% of the time**. This improves the reliability of discovered patterns by filtering out weak or noisy rules.


