<a href="https://colab.research.google.com/github/ASMT-College/lab-2-association-mining-nikhillamsal/blob/main/Lab2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Association Mining

In [2]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Step 1: Load the dataset
data = {
    'TransactionID': [1, 2, 3, 4, 5],
    'Items': [
        ['Bread', 'Milk'],
        ['Bread', 'Diaper', 'Beer', 'Eggs'],
        ['Milk', 'Diaper', 'Beer', 'Coke'],
        ['Bread', 'Milk', 'Diaper', 'Beer'],
        ['Bread', 'Milk', 'Diaper', 'Coke']
    ]
}
df = pd.DataFrame(data)
print("Initial Data:\n", df)

# Step 2: Convert dataset into a format suitable for the Apriori algorithm
# Convert the list of items into one-hot encoded format
df_items = df['Items'].apply(lambda x: pd.Series(1, index=x)).fillna(0)
print("\nOne-Hot Encoded Data:\n", df_items)

# Step 3: Apply the Apriori algorithm to find frequent itemsets
# Use a minimum support threshold of 0.6 (at least 60% of transactions)
frequent_itemsets = apriori(df_items, min_support=0.6, use_colnames=True)
print("\nFrequent Itemsets:\n", frequent_itemsets)

# Step 4: Generate association rules from the frequent itemsets
# Use a minimum confidence threshold of 0.7 (at least 70% confidence)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("\nAssociation Rules:\n", rules)

# Step 5: Interpret the results
# Display the rules in a simple format
for _, row in rules.iterrows():
    print(f"\nRule: {set(row['antecedents'])} -> {set(row['consequents'])}")
    print(f"Support: {row['support']:.2f}")
    print(f"Confidence: {row['confidence']:.2f}")
    print(f"Lift: {row['lift']:.2f}")


Initial Data:
    TransactionID                        Items
0              1                [Bread, Milk]
1              2  [Bread, Diaper, Beer, Eggs]
2              3   [Milk, Diaper, Beer, Coke]
3              4  [Bread, Milk, Diaper, Beer]
4              5  [Bread, Milk, Diaper, Coke]

One-Hot Encoded Data:
    Bread  Milk  Diaper  Beer  Eggs  Coke
0    1.0   1.0     0.0   0.0   0.0   0.0
1    1.0   0.0     1.0   1.0   1.0   0.0
2    0.0   1.0     1.0   1.0   0.0   1.0
3    1.0   1.0     1.0   1.0   0.0   0.0
4    1.0   1.0     1.0   0.0   0.0   1.0

Frequent Itemsets:
    support         itemsets
0      0.8          (Bread)
1      0.8           (Milk)
2      0.8         (Diaper)
3      0.6           (Beer)
4      0.6    (Milk, Bread)
5      0.6  (Diaper, Bread)
6      0.6   (Milk, Diaper)
7      0.6   (Beer, Diaper)

Association Rules:
   antecedents consequents  antecedent support  consequent support  support  \
0      (Milk)     (Bread)                 0.8                 0.8  

