# MARKET BASKET ANALYSIS USING APRIORI ALGORITHMN.

## 1. Project Overview Objective:


The objective of this project is to analyze customer purchase patterns using Market Basket Analysis and the Apriori algorithm. The goal is to uncover association rules that can help a supermarket develop effective marketing strategies, optimize product placement, and increase sales through promotions and bundling.

Dataset: This project utilizes two datasets:

A toy dataset to demonstrate how the Apriori algorithm works.

A more extensive dataset containing historical customer purchase records from a supermarket, sourced from Kaggle. The dataset can be accessed here.(https://drive.google.com/file/d/1GO28zbMfhy6G6COiLDGr90Bk6Ig0hV6K/view)

##2. Apriori Algorithm on the Toy Dataset


In [1]:
# 2.1. Importing Required Libraries

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
# 2.2. Creating the Toy Dataset
#The toy dataset consists of small transactions to demonstrate how the Apriori algorithm identifies frequent itemsets and association rules.

# Sample toy dataset
toy_dataset = [
    ['Skirt', 'Sneakers', 'Scarf', 'Pants', 'Hat'],
    ['Sunglasses', 'Skirt', 'Sneakers', 'Pants', 'Hat'],
    ['Dress', 'Sandals', 'Scarf', 'Pants', 'Heels'],
    ['Dress', 'Necklace', 'Earrings', 'Scarf', 'Hat', 'Heels', 'Hat'],
    ['Earrings', 'Skirt', 'Skirt', 'Scarf', 'Shirt', 'Pants']
]

  and should_run_async(code)


In [3]:
#2.3. Data Preprocessing
# We transform the dataset into a transaction format using TransactionEncoder.

# Convert the dataset into a transaction format
te = TransactionEncoder()
te_ary = te.fit(toy_dataset).transform(toy_dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

  and should_run_async(code)


In [4]:
# 2.4. Applying the Apriori Algorithm
# We find frequent itemsets that appear in at least 30% of the transactions.


# Identify frequent itemsets with a minimum support of 0.3
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

  and should_run_async(code)


In [5]:
# 2.5. Generating Association Rules
# We generate association rules using confidence as the metric, with a threshold of 70%.


# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

# Display results
print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
    support                       itemsets
0       0.4                        (Dress)
1       0.4                     (Earrings)
2       0.6                          (Hat)
3       0.4                        (Heels)
4       0.8                        (Pants)
5       0.8                        (Scarf)
6       0.6                        (Skirt)
7       0.4                     (Sneakers)
8       0.4                 (Heels, Dress)
9       0.4                 (Dress, Scarf)
10      0.4              (Earrings, Scarf)
11      0.4                   (Hat, Pants)
12      0.4                   (Hat, Scarf)
13      0.4                   (Skirt, Hat)
14      0.4                (Hat, Sneakers)
15      0.4                 (Heels, Scarf)
16      0.6                 (Pants, Scarf)
17      0.6                 (Skirt, Pants)
18      0.4              (Pants, Sneakers)
19      0.4                 (Skirt, Scarf)
20      0.4              (Skirt, Sneakers)
21      0.4          (Heels, Dress,

  and should_run_async(code)


###2.6. Interpretation of Results

Frequent Itemsets:

Most Frequent Items: Pants and Scarf, each with a support of 0.8, indicating they are present in 80% of the transactions.

Common Pairs: Some frequent pairs include (Pants, Scarf) and (Pants, Skirt), each appearing in 60% of transactions.

Association Rules:

Example: The rule (Dress) → (Heels) has a confidence of 1.0, indicating that whenever a purchase includes a dress, it always includes heels.

Lift: A rule like (Pants) → (Skirt) has a lift of 1.25, showing a positive association.

###Business Recommendations:

Product Bundling: Since dresses are always accompanied by heels, consider offering bundles or placing these items together.

Promotions: Create discounts on common pairs like pants and scarves to increase sales volume.

Optimized Product Placement: High-confidence rules such as (Sneakers, Hat) suggest placing these products near each other to encourage cross-selling.

These insights highlight how association rule mining can be used to optimize product placements, marketing campaigns, and bundling strategies in a retail environment.


##3. Apriori Algorithm on Real-World Dataset

In [6]:
#3.1. Importing and Preparing the Dataset

import pandas as pd
import plotly.express as px

# Load the dataset from Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Load the dataset
df = pd.read_csv("/content/drive/MyDrive/Untitled folder/DATASETS/Market_Basket_Optimisation.csv")

  and should_run_async(code)


Mounted at /content/drive


In [7]:
#3.2. Visualize Data with Plotly

# Basic visualization to understand the data
fig = px.histogram(df, x='shrimp')
fig.show()

  and should_run_async(code)


In [8]:
#3.3. Data Preprocessing
#We handle missing values and prepare the dataset for transaction analysis.


# Replace NaN values with 'NA'
df_filled = df.fillna('NA')

# Convert all items in the dataset to strings
transactions = df_filled.applymap(str).values.tolist()

# Transform dataset into transaction format
te_checkpoint = TransactionEncoder()
te_ary_checkpoint = te_checkpoint.fit(transactions).transform(transactions)
df_checkpoint = pd.DataFrame(te_ary_checkpoint, columns=te_checkpoint.columns_)


`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.


DataFrame.applymap has been deprecated. Use DataFrame.map instead.



In [9]:
#3.4. Applying Apriori Algorithm

# Find frequent itemsets with a support threshold of 10%
frequent_itemsets_checkpoint = apriori(df_checkpoint, min_support=0.1, use_colnames=True)


`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.



In [10]:
#3.5. Generating Association Rules

# Generate association rules
rules_checkpoint = association_rules(frequent_itemsets_checkpoint, metric="confidence", min_threshold=0.1)

# Display results
print("Frequent Itemsets in Checkpoint Dataset:")
print(frequent_itemsets_checkpoint)
print("\nAssociation Rules in Checkpoint Dataset:")
print(rules_checkpoint)

Frequent Itemsets in Checkpoint Dataset:
     support             itemsets
0   1.000000                 (NA)
1   0.163867          (chocolate)
2   0.179733               (eggs)
3   0.170933       (french fries)
4   0.132000          (green tea)
5   0.129600               (milk)
6   0.238267      (mineral water)
7   0.174133          (spaghetti)
8   0.163867      (chocolate, NA)
9   0.179733           (NA, eggs)
10  0.170933   (french fries, NA)
11  0.132000      (green tea, NA)
12  0.129600           (milk, NA)
13  0.238267  (NA, mineral water)
14  0.174133      (NA, spaghetti)

Association Rules in Checkpoint Dataset:
        antecedents      consequents  antecedent support  consequent support  \
0       (chocolate)             (NA)            0.163867            1.000000   
1              (NA)      (chocolate)            1.000000            0.163867   
2              (NA)           (eggs)            1.000000            0.179733   
3            (eggs)             (NA)            0.179


`should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.



###3.6. Interpretation and Business Recommendations

Frequent Itemsets:
Mineral Water appears in 23.83% of transactions, followed by Eggs (17.97%) and Spaghetti (17.41%).

Presence of 'NA' indicates potential issues with data collection; this should be investigated.

Association Rules:

Confidence & Lift:
 Rules like (Mineral Water → Spaghetti) can help design marketing strategies like bundling and discounts.
Example Rule: Whenever 'NA' is present, there's a 100% chance of finding 'Green Tea' and 'Spaghetti'. This should be investigated further to see if 'NA' represents a product category or an issue with the data.

Business Recommendations:

Address Data Quality: 'NA' appears to be prevalent. Consider cleaning the dataset by addressing missing or incorrect data.

Create Bundles and Offers: Products like Mineral Water, Eggs, and Spaghetti are frequently bought together. Consider promotions that capitalize on these associations.

Further Analysis: Segment data or explore seasonal patterns to refine marketing strategies

##4. Conclusion
This project demonstrates the use of the Apriori algorithm to uncover hidden patterns in supermarket purchase data. By identifying frequently bought items and high-confidence associations, we can make informed recommendations for marketing strategies, product placement, and promotional offers. Addressing data quality issues, especially those involving 'NA' values, would allow for even deeper insights and more effective strategies.

Key Takeaways:
The Apriori algorithm is an effective tool for Market Basket Analysis.
Frequent itemsets and association rules provide insights into customer purchase behavior.
Business recommendations can drive strategic marketing and sales improvements.

Next Steps: Future analysis could include deeper segmentation, time-series analysis to understand buying patterns across different seasons, or advanced algorithms for better recommendation systems.