**1: Load and Inspect the Dataset**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [2]:
df = pd.read_csv("/content/Groceries_dataset.csv")
print("Data Shape:", df.shape)
print(df.head())


Data Shape: (38765, 3)
   Member_number        Date   itemDescription
0           1808  21-07-2015    tropical fruit
1           2552  05-01-2015        whole milk
2           2300  19-09-2015         pip fruit
3           1187  12-12-2015  other vegetables
4           3037  01-02-2015        whole milk


**2: Preprocess Data**

In [4]:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')

**3. Group Items by Transaction**

In [5]:
# Combine to a single column "TransactionID"
df["TransactionID"] = df["Member_number"].astype(str) + "_" + df["Date"].astype(str)

# Group by "TransactionID" to collect items
df_grouped = df.groupby("TransactionID")["itemDescription"].apply(list).reset_index(name='Items')

print("Number of unique transactions:", df_grouped.shape[0])
print(df_grouped.head())


Number of unique transactions: 14963
     TransactionID                                              Items
0  1000_2014-06-24                  [whole milk, pastry, salty snack]
1  1000_2015-03-15  [sausage, whole milk, semi-finished bread, yog...
2  1000_2015-05-27                         [soda, pickled vegetables]
3  1000_2015-07-24                     [canned beer, misc. beverages]
4  1000_2015-11-25                        [sausage, hygiene articles]


**4. Convert to Transaction List**

In [6]:
transactions = df_grouped["Items"].tolist()
print(f"Example transaction: {transactions[0]}")


Example transaction: ['whole milk', 'pastry', 'salty snack']


**5.Encode the Transactions**

In [7]:
!pip install mlxtend  # if not installed
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_array, columns=te.columns_)




**6. Apply Apriori to Find Frequent Itemsets**

In [12]:
from mlxtend.frequent_patterns import apriori

# min_support = for example 0.01 means itemset in at least 1% of transactions
frequent_itemsets = apriori(df_encoded, min_support=0.005, use_colnames=True)
print(frequent_itemsets.head())


    support         itemsets
0  0.021386       (UHT-milk)
1  0.008087  (baking powder)
2  0.033950           (beef)
3  0.021787        (berries)
4  0.016574      (beverages)


**7. Generate Association Rules**

In [15]:
from mlxtend.frequent_patterns import association_rules

rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
rules.sort_values("confidence", ascending=False, inplace=True)

print(rules.head(10))


        antecedents         consequents  antecedent support  \
0    (bottled beer)        (whole milk)            0.045312   
14        (sausage)        (whole milk)            0.060349   
7      (newspapers)        (whole milk)            0.038896   
4   (domestic eggs)        (whole milk)            0.037091   
6     (frankfurter)        (whole milk)            0.037760   
5     (frankfurter)  (other vegetables)            0.037760   
11           (pork)        (whole milk)            0.037091   
10      (pip fruit)        (whole milk)            0.049054   
3    (citrus fruit)        (whole milk)            0.053131   
15  (shopping bags)        (whole milk)            0.047584   

    consequent support   support  confidence      lift  representativity  \
0             0.157923  0.007151    0.157817  0.999330               1.0   
14            0.157923  0.008955    0.148394  0.939663               1.0   
7             0.157923  0.005614    0.144330  0.913926               1.0   
4 

**8. Filter & Interpret**

In [18]:
strong_rules = rules[(rules['lift'] > 1) & (rules['confidence'] > 0.5)]
print("Strong Rules with confidence > 0.5 and lift > 1:")
print(strong_rules)


Strong Rules with confidence > 0.5 and lift > 1:
Empty DataFrame
Columns: [antecedents, consequents, antecedent support, consequent support, support, confidence, lift, representativity, leverage, conviction, zhangs_metric, jaccard, certainty, kulczynski]
Index: []


 Part 3: we conducted association rule learning (Market Basket Analysis) on a groceries dataset. We converted raw transaction data—where each row represented a single purchase by a specific member on a certain date—into a more convenient format, grouping items by (Member_number, Date). Next, we used one-hot encoding to create a boolean matrix of items per transaction. Applying the Apriori algorithm on these encoded transactions, we discovered frequently co-occurring sets of items (frequent itemsets), and then generated association rules—e.g., “if a transaction includes item X, it often includes item Y.” Finally, we filtered and interpreted rules based on metrics like support, confidence, and lift to identify actionable insights about items commonly purchased together.