In [1]:
# Cell 1: Install necessary library
!pip install mlxtend pandas

Defaulting to user installation because normal site-packages is not writeable


In [None]:
# Cell 2: Import Libraries and Prepare Data
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# 1. Load the dataset (List of Lists format)
dataset = [
    ['Bread', 'Milk', 'Eggs'],           # Trans ID 1
    ['Bread', 'Butter'],                 # Trans ID 2
    ['Milk', 'Diapers', 'Beer'],         # Trans ID 3
    ['Bread', 'Milk', 'Butter'],         # Trans ID 4
    ['Milk', 'Diapers', 'Bread'],        # Trans ID 5
    ['Beer', 'Diapers'],                 # Trans ID 6
    ['Bread', 'Milk', 'Eggs', 'Butter'], # Trans ID 7
    ['Eggs', 'Milk'],                    # Trans ID 8
    ['Bread', 'Diapers', 'Beer'],        # Trans ID 9
    ['Milk', 'Butter']                   # Trans ID 10
]

# 2. Encode the transaction data (One-Hot Encoding)
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)

# Create the DataFrame
df = pd.DataFrame(te_ary, columns=te.columns_)

# Display the encoded dataframe (First few rows)
print("One-Hot Encoded Data:")
display(df)

One-Hot Encoded Data:


Unnamed: 0,Beer,Bread,Butter,Diapers,Eggs,Milk
0,False,True,False,False,True,True
1,False,True,True,False,False,False
2,True,False,False,True,False,True
3,False,True,True,False,False,True
4,False,True,False,True,False,True
5,True,False,False,True,False,False
6,False,True,True,False,True,True
7,False,False,False,False,True,True
8,True,True,False,True,False,False
9,False,False,True,False,False,True


## Explanation of One-Hot Encoded Data

The transaction data has been transformed into a binary matrix where:
- Each row represents a transaction (customer purchase).
- Each column represents an item (e.g., Bread, Milk).
- A value of `True` (or 1) means the item was purchased in that transaction; `False` (or 0) means it was not.

This format is required for the Apriori algorithm, which analyzes patterns in binary data to find associations between items.

**Key Insights:**
- There are 10 transactions and 6 unique items.
- Milk appears in 7 transactions (70%), Bread in 6 (60%), etc.
- This encoding allows us to quantify how often items co-occur.

In [3]:
# Cell 3: Apply Apriori Algorithm
# 1. Generate Frequent Itemsets (Min Support = 0.2)
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

# Display itemsets to verify support counts
print("\nFrequent Itemsets:")
print(frequent_itemsets)

# 2. Generate Association Rules (Min Confidence = 0.5)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# 3. Filter and Display specific columns: Support, Confidence, Lift
# We round the values to 2 decimal places for cleaner viewing
cols_to_keep = ['antecedents', 'consequents', 'support', 'confidence', 'lift']
clean_rules = rules[cols_to_keep].copy()
clean_rules = clean_rules.round(2)

print("\nAssociation Rules (Sorted by Lift):")
# Sorting by Lift helps us identify the strongest rules immediately
display(clean_rules.sort_values(by='lift', ascending=False))


Frequent Itemsets:
    support               itemsets
0       0.3                 (Beer)
1       0.6                (Bread)
2       0.4               (Butter)
3       0.4              (Diapers)
4       0.3                 (Eggs)
5       0.7                 (Milk)
6       0.3        (Diapers, Beer)
7       0.3        (Butter, Bread)
8       0.2       (Diapers, Bread)
9       0.2          (Eggs, Bread)
10      0.4          (Milk, Bread)
11      0.3         (Milk, Butter)
12      0.2        (Diapers, Milk)
13      0.3           (Milk, Eggs)
14      0.2  (Milk, Butter, Bread)
15      0.2    (Milk, Eggs, Bread)

Association Rules (Sorted by Lift):


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Diapers),(Beer),0.3,0.75,2.5
1,(Beer),(Diapers),0.3,1.0,2.5
16,"(Milk, Bread)",(Eggs),0.2,0.5,1.67
18,(Eggs),"(Milk, Bread)",0.2,0.67,1.67
10,(Eggs),(Milk),0.3,1.0,1.43
17,"(Eggs, Bread)",(Milk),0.2,1.0,1.43
2,(Butter),(Bread),0.3,0.75,1.25
12,"(Milk, Bread)",(Butter),0.2,0.5,1.25
3,(Bread),(Butter),0.3,0.5,1.25
14,(Butter),"(Milk, Bread)",0.2,0.5,1.25


## Explanation of Apriori Results

### Frequent Itemsets
These are combinations of items that appear together in at least 20% of transactions (min_support=0.2). For example:
- `(Milk)` has support 0.7: Milk appears in 70% of transactions.
- `(Milk, Bread)` has support 0.4: Both Milk and Bread appear together in 40% of transactions.

Higher support means the itemset is more common.

### Association Rules
Rules show "if-then" relationships, like "If someone buys X, they are likely to buy Y." Sorted by Lift (descending) to highlight strongest associations.

**Columns Explained:**
- **Antecedents**: The "if" part (e.g., {Bread}).
- **Consequents**: The "then" part (e.g., {Milk}).
- **Support**: Fraction of transactions containing both antecedents and consequents.
- **Confidence**: Probability that consequents appear given antecedents (e.g., 0.67 means 67% of Bread buyers also buy Milk).
- **Lift**: How much more likely the consequent is with the antecedent than by chance. Lift > 1 indicates a positive association.

**Example Rule Interpretation:**
- Rule: {Bread} â†’ {Milk} (Support: 0.4, Confidence: 0.67, Lift: 0.95)
  - 40% of all transactions include both Bread and Milk.
  - If Bread is bought, there's a 67% chance Milk is also bought.
  - Lift 0.95 (<1) suggests a slight negative association (less likely together than random).

**How to Make This Understandable to Others:**
- **Visualize**: Use scatter plots (Lift vs. Support) or network graphs to show rules.
- **Simplify Language**: Avoid jargon; say "Customers who buy Bread often also buy Milk."
- **Business Context**: Explain implications, e.g., "Place Bread and Milk near each other in the store."
- **Filter Rules**: Focus on high-lift, high-confidence rules for actionable insights.
- **Interactive Demo**: Share the notebook or create a dashboard with filters.

In [4]:
# Cell 4: Pretty-print Association Rules for Better Readability
# Convert frozensets to readable strings
rules_copy = rules.copy()
rules_copy['antecedents'] = rules_copy['antecedents'].apply(lambda x: ', '.join(list(x)))
rules_copy['consequents'] = rules_copy['consequents'].apply(lambda x: ', '.join(list(x)))

# Display the cleaned rules
print("Readable Association Rules (Sorted by Lift):")
display(rules_copy[cols_to_keep].sort_values(by='lift', ascending=False))

Readable Association Rules (Sorted by Lift):


Unnamed: 0,antecedents,consequents,support,confidence,lift
1,Beer,Diapers,0.3,1.0,2.5
0,Diapers,Beer,0.3,0.75,2.5
16,"Milk, Bread",Eggs,0.2,0.5,1.666667
18,Eggs,"Milk, Bread",0.2,0.666667,1.666667
10,Eggs,Milk,0.3,1.0,1.428571
17,"Eggs, Bread",Milk,0.2,1.0,1.428571
12,"Milk, Bread",Butter,0.2,0.5,1.25
14,Butter,"Milk, Bread",0.2,0.5,1.25
3,Bread,Butter,0.3,0.5,1.25
2,Butter,Bread,0.3,0.75,1.25


In [None]:
#Part C: Interpretation
#1. Identify the three strongest rules based on Lift
#The "Lift" metric measures how much more often the antecedent and consequent occur together than we would expect if they were statistically independent. A lift > 1 implies a positive relationship.

#--------------------------------
#Based on the calculations in Part B, the three strongest rules are:
#Beer 

 #Diapers (Lift: 2.5)
# #Explanation: Buying beer increases the probability of buying diapers by 2.5 times compared to the random baseline. This indicates a very strong association, often cited in data mining folklore (the "young fathers" scenario).
#Eggs 

 #(Bread, Milk) (Lift: 1.67)
#Explanation: Customers who buy Eggs are 1.67 times more likely to also have a basket containing both Bread and Milk. This suggests these items form a "Breakfast Bundle."
#Eggs 

 #Milk (Lift: 1.43)
#Explanation: There is a strong dependency here; buying eggs significantly increases the likelihood of buying milk. (Note: The confidence is 1.0, meaning every time someone bought eggs in this dataset, they also bought milk).

#--------------------------------
2#. Business Recommendations
#Based on the insights derived from the rules above:
#Product Placement (Cross-Merchandising):
#Since Beer and Diapers have the highest lift and a very strong correlation, the supermarket should place these items closer together, or place high-margin impulse items (like savory snacks) between the beer and diaper aisles to capitalize on this specific customer traffic flow.
#Bundling and Promotions:
#Create a "Breakfast Essentials" bundle offering a slight discount when Bread, Milk, and Eggs are bought together. Since Eggs -> Milk has 100% confidence and (Bread, Milk) -> Eggs has high lift, marketing these items together in a flyer or near the entrance will likely increase the basket size for customers who intended to buy only one of the items.