# Association Rule Mining
- Generate synthetic transactional data (e.g., 20-50 transactions, each a list of 3-8 items from a pool of 20 items like ['milk', 'bread', 'beer', 'diapers', 'eggs', etc.]; use random.choices to create baskets with patterns, like frequent co-occurrences). Use Python lists or
pandas. Include generation code.
2. Analyze: Discuss one rule's implications (e.g., for retail recommendations).

In [2]:
# Import libraries
import pandas as pd
import random
from mlxtend.frequent_patterns import apriori, association_rules
import warnings
warnings.filterwarnings("ignore")

# Generate Synthetic transaction data
random.seed(42)
num_transactions = random.randint(20, 50)
item_pool = ['milk', 'bread', 'beer', 'diapers', 'eggs', 'fruit', 'vegetables', 'snacks', 'cereal', 'juice',
            'toilet paper', 'detergent', 'meat', 'fish', 'pasta', 'rice', 'beans', 'soda', 'water', 'coffee']
transactions = []

for _ in range(num_transactions):
    basket_size = random.randint(3, 8)
    basket = random.choices(item_pool, k=basket_size)
    # remove duplicates
    transactions.append(list(set(basket)))

# Convert to DataFrame
df = pd.DataFrame({'Transaction': range(1, num_transactions + 1),
                'Items': transactions})
df.head()

Unnamed: 0,Transaction,Items
0,1,"[fruit, eggs, milk]"
1,2,"[pasta, bread, eggs, beer, toilet paper, deter..."
2,3,"[eggs, toilet paper, detergent, meat, diapers,..."
3,4,"[coffee, fish, vegetables, diapers]"
4,5,"[fruit, snacks, beer, vegetables]"


Prepare Data for Mining - One-Hot Encoding

In [3]:
# Convert to One-Hot Encoding
all_items = sorted(set(item for sublist in transactions for item in sublist))
encoded_df = pd.DataFrame(0, index=range(len(transactions)), columns=all_items)

for idx, basket in enumerate(transactions):
    for item in basket:
        encoded_df.loc[idx, item] = 1

Apply Apriori Algorithm to find most frequent itemsets

In [4]:
# Apply Apriori
frequent_itemsets = apriori(encoded_df, min_support=0.1, use_colnames=True)
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.175,(beans)
1,0.125,(beer)
2,0.35,(bread)
3,0.125,(cereal)
4,0.175,(coffee)


Generating an association rules

In [5]:
# Generating association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules = rules.sort_values(by="lift", ascending=False)

print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
if not rules.empty:
    best_rule = rules.iloc[0]
    antecedent = ', '.join(list(best_rule['antecedents']))
    consequent = ', '.join(list(best_rule['consequents']))
    print("\nExample Rule Analysis:")
    print(f"If a customer buys [{antecedent}], they are likely to also buy [{consequent}] "
        f"(confidence: {best_rule['confidence']:.2f}, lift: {best_rule['lift']:.2f}).")
    print("Implication: This could be used in retail for product bundling or targeted recommendations.")


Association Rules:
       antecedents     consequents  support  confidence      lift
75         (juice)   (fruit, fish)      0.1    0.400000  4.000000
74   (fruit, fish)         (juice)      0.1    1.000000  4.000000
72  (juice, fruit)          (fish)      0.1    1.000000  4.000000
77          (fish)  (juice, fruit)      0.1    0.400000  4.000000
73   (juice, fish)         (fruit)      0.1    0.800000  2.666667
..             ...             ...      ...         ...       ...
10         (bread)  (toilet paper)      0.1    0.285714  1.038961
13         (water)         (bread)      0.1    0.363636  1.038961
12         (bread)         (water)      0.1    0.285714  1.038961
27       (diapers)          (soda)      0.1    0.333333  1.025641
26          (soda)       (diapers)      0.1    0.307692  1.025641

[78 rows x 5 columns]

Example Rule Analysis:
If a customer buys [juice], they are likely to also buy [fruit, fish] (confidence: 0.40, lift: 4.00).
Implication: This could be used in reta