# **Association Rule Mining**
Association rule mining (ARM) finds frequently occurring if-then patterns in the data. The output is in the form of rules that describe the most important combinations of features that co-occur frequently.

Association rule mining falls under the category of unsupervised learning as we don’t have access to the correct answers

In [8]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
transactions = [
    ['Milk', 'Egg', 'Bread', 'Butter'],
    ['Milk', 'Butter', 'Egg', 'Ketchup'],
    ['Bread', 'Butter', 'Ketchup'],
    ['Milk', 'Bread', 'Butter'],
    ['Bread', 'Butter', 'Cookies'],
    ['Milk', 'Bread', 'Butter', 'Cookies'],
    ['Milk', 'Cookies'],
    ['Milk', 'Bread', 'Butter'],
    ['Bread', 'Butter', 'Egg', 'Cookies'],
    ['Milk', 'Butter', 'Bread'],
    ['Milk', 'Bread', 'Butter'],
    ['Milk', 'Bread', 'Cookies', 'Ketchup']
]

encoder = TransactionEncoder()
encoded_transactions = encoder.fit(transactions).transform(transactions)
df = pd.DataFrame(encoded_transactions, columns=encoder.columns_)

# **Apriori Algorithm**
The Apriori algorithm is a classic association rule learning algorithm used in Data Mining to identify frequent itemsets and association rules. It is based on the principle that if a Set of items (itemset) appears frequently in a dataset, then the probability of the co-occurrence of these items is high.

In [9]:
# Applying the Apriori algorithm with support >=0.33
frequent_itemsets = apriori(df, min_support=0.33, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
    support               itemsets
0  0.833333                (Bread)
1  0.833333               (Butter)
2  0.416667              (Cookies)
3  0.750000                 (Milk)
4  0.750000        (Bread, Butter)
5  0.333333       (Bread, Cookies)
6  0.583333          (Bread, Milk)
7  0.583333         (Milk, Butter)
8  0.500000  (Bread, Milk, Butter)


In [10]:
# Generating the association rules with confidence of >=0.5
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

print("Association Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Association Rules:
        antecedents      consequents   support  confidence      lift
0           (Bread)         (Butter)  0.750000    0.900000  1.080000
1          (Butter)          (Bread)  0.750000    0.900000  1.080000
2         (Cookies)          (Bread)  0.333333    0.800000  0.960000
3           (Bread)           (Milk)  0.583333    0.700000  0.933333
4            (Milk)          (Bread)  0.583333    0.777778  0.933333
5            (Milk)         (Butter)  0.583333    0.777778  0.933333
6          (Butter)           (Milk)  0.583333    0.700000  0.933333
7     (Bread, Milk)         (Butter)  0.500000    0.857143  1.028571
8   (Bread, Butter)           (Milk)  0.500000    0.666667  0.888889
9    (Milk, Butter)          (Bread)  0.500000    0.857143  1.028571
10          (Bread)   (Milk, Butter)  0.500000    0.600000  1.028571
11           (Milk)  (Bread, Butter)  0.500000    0.666667  0.888889
12         (Butter)    (Bread, Milk)  0.500000    0.600000  1.028571


By performing Apriori-based association rule mining, we extracted valuable insights into how items are frequently bought together.
Frequent Itemsets (min_support = 0.33)
- Milk & Butter appear together frequently, indicating they are commonly purchased as a pair.
- Bread & Butter also have high support, suggesting that consumers prefer them together.
- Cookies & Milk show up frequently, reinforcing the idea that they are often consumed together.

Association Rules (min_confidence = 0.5)
Examples of rules generated:
- {Milk} → {Butter} (strong confidence): If a customer buys Milk, they are highly likely to buy Butter.
- {Bread, Butter} → {Milk}: People who buy Bread & Butter often buy Milk too.
- {Cookies} → {Milk}: A strong indicator that Cookies are frequently paired with Milk.


# **ECLAT**

ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) is a fast and efficient algorithm used in association rule mining to find frequent itemsets in transaction data.

Unlike the Apriori algorithm which works in a horizontal format (row-wise transactions), ECLAT works in vertical format, where each item is associated with a list of transactions in which it appears. This makes set intersections fast and memory-efficient.





In [11]:
import pandas as pd

df = pd.DataFrame({
    'Bread':  [1,0,0,1,1,0,1,1,1],
    'Butter': [1,1,1,1,0,1,0,1,1],
    'Milk':   [0,0,1,0,1,1,0,1,1],
    'Coke':   [0,1,0,1,1,1,1,1,0],
    'Jam':    [1,0,0,0,0,0,0,1,0]}, index=[f"T{i+1}" for i in range(9)])

vertical = {item: set(df.index[df[item] == 1]) for item in df.columns}

In [12]:
def eclat(prefix, items, min_sup, results):
    for i, (item, tids) in enumerate(items):
        new_prefix = prefix + [item]
        support = len(tids)
        if support >= min_sup:
            results.append((new_prefix, support))
            new_items = [
                (other_item, tids & other_tids)
                for other_item, other_tids in items[i+1:]
                if len(tids & other_tids) >= min_sup
            ]
            eclat(new_prefix, new_items, min_sup, results)

In [13]:
min_sup = 2
results = []
eclat([], list(vertical.items()), min_sup, results)

print(f" Frequent Itemsets using ECLAT (support ≥ {min_sup}):")
for items, sup in results:
    print(f"Items: {items}, Support: {sup}")

 Frequent Itemsets using ECLAT (support ≥ 2):
Items: ['Bread'], Support: 6
Items: ['Bread', 'Butter'], Support: 4
Items: ['Bread', 'Butter', 'Milk'], Support: 2
Items: ['Bread', 'Butter', 'Coke'], Support: 2
Items: ['Bread', 'Butter', 'Jam'], Support: 2
Items: ['Bread', 'Milk'], Support: 3
Items: ['Bread', 'Milk', 'Coke'], Support: 2
Items: ['Bread', 'Coke'], Support: 4
Items: ['Bread', 'Jam'], Support: 2
Items: ['Butter'], Support: 7
Items: ['Butter', 'Milk'], Support: 4
Items: ['Butter', 'Milk', 'Coke'], Support: 2
Items: ['Butter', 'Coke'], Support: 4
Items: ['Butter', 'Jam'], Support: 2
Items: ['Milk'], Support: 5
Items: ['Milk', 'Coke'], Support: 3
Items: ['Coke'], Support: 6
Items: ['Jam'], Support: 2


***The result we got is a list of frequent itemsets such as:

['Bread'] appears in 6 transactions

['Bread', 'Butter'] appears together in 4 transactions

['Milk', 'Coke'] appears together in 3 transactions

... and so on.

This means these combinations are statistically significant in our data and could be useful for:

Market basket analysis (e.g., "People who buy Bread also buy Butter")

Recommendation systems***