# Project Introduction
Market Basket Analysis (MBA) uncovers which items customers buy together. The business objectives here are to identify cross-selling opportunities, define product bundles, and inform store layout decisions to increase average basket value.

In [None]:
# Imports & Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Plot style
sns.set(style='whitegrid')

In [None]:
# Load & Inspect Dataset
import os
candidates = ['../data/groceries.csv', '../data/Groceries_dataset.csv', 'data/groceries.csv', 'data/Groceries_dataset.csv']
for p in candidates:
    if os.path.exists(p):
        path = p
        break
else:
    raise FileNotFoundError('Dataset not found in expected locations.')

data = pd.read_csv(path)
print(f'Loaded: {path}')
print('Shape:', data.shape)
print('\nFirst rows:')
print(data.head())
print('\nInfo:')
print(data.info())

# We'll group items by Member_number to form transactions for the market-basket analysis.

# Brief explanation:# The dataset contains transaction records with columns: Member_number, Date, itemDescription

**Dataset summary (plain English):** The table shows individual purchases (one item per row). We will aggregate by `Member_number` so each transaction is a list of items a customer bought in a visit.

In [None]:
# Data Preparation
transactions = data.groupby('Member_number')['itemDescription'].apply(list).tolist()
print('Number of transactions:', len(transactions))
print('\nExample transactions (first 5):')
for t in transactions[:5]:
    print('-', t)

# The final structure `transactions` is a list of lists, ready for TransactionEncoder.

**Data preparation summary:** We grouped rows by `Member_number` to form transactions (lists of items). This gives us one record per shopping visit suitable for frequent-pattern mining.

## One-Hot Encoding
Use TransactionEncoder to convert transaction lists into a boolean (one-hot) basket dataframe: each column is an item and each row is a transaction (True if item purchased).

In [None]:
# One-Hot Encoding
encoder = TransactionEncoder()
encoded_array = encoder.fit(transactions).transform(transactions)
basket_df = pd.DataFrame(encoded_array, columns=encoder.columns_)

print('Basket DataFrame shape:', basket_df.shape)
print('\nFirst rows:')
print(basket_df.head())

**One-hot encoding summary:** We converted transaction lists to a boolean table where each column is an item and rows are visits — True means the item was bought in that visit. This format is required for Apriori.

## Frequent Itemset Mining
Apply the Apriori algorithm with min_support = 0.02 to find frequent itemsets — these are items commonly purchased together.

In [None]:
# Frequent Itemset Mining
frequent_itemsets = apriori(basket_df, min_support=0.02, use_colnames=True)
frequent_itemsets = frequent_itemsets.sort_values(by='support', ascending=False).reset_index(drop=True)

print('Top frequent itemsets (by support):')
print(frequent_itemsets.head(10))

# Short explanation: Frequent itemsets show commonly co-occurring items in customer baskets.

**Frequent itemset summary:** Frequent itemsets show combinations of items that appear together often (support indicates how common they are). We used min_support = 0.02 to focus on meaningful patterns.

## Association Rule Mining
Generate association rules from frequent itemsets and filter by lift > 1 — rules with lift > 1 are more useful than random co-occurrence.

In [None]:
# Association Rule Mining
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)
# Keep only useful rules with lift > 1
rules = rules[rules['lift'] > 1].sort_values(by='lift', ascending=False).reset_index(drop=True)

print('Top rules (sorted by lift):')
print(rules[['antecedents','consequents','support','confidence','lift']].head(10))

# Explanation: Antecedent -> Consequent (if antecedent bought, likely consequent bought). Lift > 1 indicates positive association.

**Association rules summary:** Rules show "if-then" relations. We filtered for lift > 1 to keep rules that are more informative than random chance. Columns: support (how common), confidence (reliability), lift (strength vs independent occurrence).

## Visualization
Create a single scatter plot: support (x) vs confidence (y), point size shows lift. This helps quickly spot strong and reliable rules.

In [None]:
# Visualization
plt.figure(figsize=(10, 6))
# size markers by lift (scaled)
sizes = (rules['lift'] - rules['lift'].min() + 0.1) * 200
plt.scatter(rules['support'], rules['confidence'], s=sizes, alpha=0.6)
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.title('Support vs Confidence (size ~ Lift)')
plt.grid(True)

# Annotate top 5 rules by lift for clarity
for _, row in rules.head(5).iterrows():
    ant = ','.join(list(row['antecedents']))
    cons = ','.join(list(row['consequents']))
    plt.annotate(f"{ant} → {cons}", (row['support'], row['confidence']), fontsize=8, alpha=0.8)

plt.show()

**Visualization summary:** The scatter plot places support on x and confidence on y; point size represents lift — large points are strong and interesting associations. Use this plot to pick rules for promotions (high lift and reasonable confidence).

# Business Insights
- **Whole milk is central**: milk frequently appears in high-lift rules together with bakery items (e.g., rolls/buns) and yogurt — good target for point-of-sale cross-sells and bundled discounts.
- **Meal combinations exist**: combinations like sausage + rolls/buns + whole milk suggest small meal or breakfast bundles that can be promoted together.
- **Promotions strategy**: use high-confidence rules for store-wide placement (e.g., place common pairs closer), and high-lift rules (even if lower support) for targeted coupons or personalized offers.

**How to use these rules (simple next steps):**
1. Test a small bundle (milk + rolls) with a short coupon to measure lift in sales.
2. Place complementary items near each other to increase impulsive pair purchases.
3. Use targeted promotions (loyalty app or weekly flyer) for high-lift item pairs to raise average basket value.

# Conclusion
The analysis produced actionable rules for cross-selling and bundling; these can be A/B tested immediately to measure uplift in average transaction value.

# README

## Project Overview
This project focuses on Market Basket Analysis (MBA) using transaction data to uncover purchasing patterns and provide actionable business insights.

## Dataset Description
The dataset used in this analysis is `data/Groceries_dataset.csv` (provided). It contains transaction-level grocery purchases with columns: `Member_number`, `Date`, and `itemDescription`.

## Steps Performed
1. Imported necessary libraries.
2. Loaded and inspected the dataset.
3. Prepared the data for analysis.
4. Applied one-hot encoding to create a basket dataframe.
5. Conducted frequent itemset mining using the Apriori algorithm.
6. Generated association rules and filtered them based on lift (>1).
7. Visualized the results.
8. Provided business insights and conclusions.

## Key Insights
- Identified strong cross-selling opportunities.
- Suggested product bundling strategies.
- Provided insights for optimizing store layouts.

## Tools Used
- Python
- Jupyter Notebook
- Libraries: pandas, numpy, matplotlib, seaborn, mlxtend