# Association Rule Mining on Groceries Dataset: PyCaret 2.3.5

This notebook mines market-basket **association rules** using **PyCaret 2.3.5** in a local Jupyter environment.
Using Groceries data set https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset from kaggle

## 1) Environment check

In [1]:
import sys
print("Python:", sys.version)
import pycaret
print("PyCaret:", pycaret.__version__)

Python: 3.9.24 (main, Oct 10 2025, 08:51:58) 
[GCC 13.3.0]
PyCaret: 2.3.5


## 2) Load dataset

In [2]:
import os, pandas as pd
DATA_PATH = "./Groceries_dataset.csv"  # adjust as needed

print("Reading:", DATA_PATH)
df = pd.read_csv(DATA_PATH)
print(df.shape)
df.head()

Reading: ./Groceries_dataset.csv
(38765, 3)


Unnamed: 0,Member_number,Date,itemDescription
0,1808,21-07-2015,tropical fruit
1,2552,05-01-2015,whole milk
2,2300,19-09-2015,pip fruit
3,1187,12-12-2015,other vegetables
4,3037,01-02-2015,whole milk


## 3) Clean & prepare (transactional long format)

In [4]:
df.columns = [c.strip().lower() for c in df.columns]
df.rename(columns={'itemdescription': 'item'}, inplace=True)
assert 'member_number' in df.columns and 'item' in df.columns, f"Columns present: {df.columns.tolist()}"

df['item'] = df['item'].astype(str).str.strip()
tx = df[['member_number','item']].dropna().drop_duplicates(subset=['member_number','item']).copy()

print("Unique transactions:", tx['member_number'].nunique())
print("Unique items:", tx['item'].nunique())
tx.head()

Unique transactions: 3898
Unique items: 167


Unnamed: 0,member_number,item
0,1808,tropical fruit
1,2552,whole milk
2,2300,pip fruit
3,1187,other vegetables
4,3037,whole milk


## 4) Association rules with PyCaret (transactional)

In [None]:
from pycaret.arules import setup, create_model

s = setup(data=tx, transaction_id='member_number', item_id='item', session_id=42)
rules = create_model(metric='confidence', threshold=0.3, min_support=0.001)
rules_sorted = rules.sort_values(['lift','confidence','support'], ascending=False)
rules_sorted.head(20)

Description,Value
session_id,42.0
# Transactions,3898.0
# Items,167.0
Ignore Items,


