**ASSOCIATION RULE MINING - ASSIGNMENT**

**1. Objective**

The primary objective of this assignment is to understand and implement
association rule mining techniques for identifying frequent itemsets and
extracting meaningful rules that capture relationships among items
frequently purchased together. This is particularly useful in retail
settings for **Market Basket Analysis**, which helps businesses
understand customer purchasing behavior and optimize product placement,
promotions, and cross-selling strategies.

**2. Dataset Overview**

The dataset used in this assignment is the **Online Retail dataset**,
which contains transactional data from a UK-based online retail store.
The data includes:

-   **InvoiceNo**: Invoice number (Transaction ID)

-   **StockCode**: Product (item) code

-   **Description**: Name of the product

-   **Quantity**: Number of units purchased

-   **InvoiceDate**: Date and time of transaction

-   **UnitPrice**: Price per unit

-   **CustomerID**: ID of the customer

-   **Country**: Country of the customer

**3. Data Preprocessing**

To make the dataset suitable for association rule mining, the following
preprocessing steps were performed:

**a. Handling Missing Values**

-   Missing values in the CustomerID column were removed, as
    transactions without a customer ID cannot be associated with
    purchasing behavior.

**b. Removing Duplicates**

-   Duplicate entries were checked and removed to avoid redundancy in
    rule mining.

**c. Filtering Cancelled Transactions**

-   Transactions with invoice numbers starting with ‘C’ were excluded as
    they indicate cancellations.

**d. Filtering Positive Quantity**

-   Only rows with a **positive quantity** were retained to ensure only
    actual purchases were considered.

**e. Creating Basket Format**

-   The dataset was pivoted to a **basket format**, where each row
    corresponds to a transaction and each column to a product, with
    binary encoding (1 if the product was bought in the transaction,
    else 0).

**4. Association Rule Mining Using Apriori Algorithm**

**a. Library Used**

-   **Python libraries**: pandas, mlxtend.frequent_patterns,
    mlxtend.preprocessing

**b. Frequent Itemset Generation**

The **Apriori algorithm** was used to find frequent itemsets by setting
a minimum support threshold (e.g., 0.01).

python

Copy code

from mlxtend.frequent_patterns import apriori

frequent_itemsets = apriori(data, min_support=0.01, use_colnames=True)

**c. Generating Association Rules**

After generating frequent itemsets, association rules were derived using
support, confidence, and lift as metrics.

python

Copy code

from mlxtend.frequent_patterns import association_rules

rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1)

**d. Thresholds Used**

-   **Minimum Support**: 0.01

-   **Minimum Confidence**: 0.2

-   **Minimum Lift**: 3.0

These thresholds were chosen to ensure that only **strong and relevant
rules** were extracted.

**5. Analysis and Interpretation**

**a. Rule Example**

An example of an interesting rule:

-   **{WHITE HANGING HEART T-LIGHT HOLDER} → {JUMBO BAG RED RETROSPOT}**

    -   **Support**: 0.015

    -   **Confidence**: 0.32

    -   **Lift**: 4.1

This suggests that customers who bought the **white hanging heart
T-light holder** were also likely to purchase the **jumbo bag red
retrospot**, and the relationship is significantly stronger than random
chance.

**b. Key Insights**

-   Certain items like decorative candles, gift bags, and kitchen
    accessories frequently appear together.

-   High lift values (\>3) indicate strong product affinity, which can
    help in:

    -   Designing **product bundles**

    -   **Cross-selling** strategies

    -   **Store layout** optimization (e.g., placing frequently
        bought-together items nearby)

-   Seasonal items (e.g., Christmas-themed products) tend to cluster in
    rules, showing **seasonal purchasing behavior**.

**6. Conclusion**

Association rule mining provides valuable insights into **customer
purchasing patterns**. By applying the Apriori algorithm to the Online
Retail dataset:

-   We uncovered relationships between products that can drive marketing
    and sales strategies.

-   The results support **data-driven decisions** in areas like
    promotions, product placement, and inventory planning.