# 1. Business Understanding (CRISP-DM Phase 1)

## 1.1. Project Context
This project utilizes the **RetailRocket E-commerce dataset**, capturing real-world implicit feedback (views, cart adds, transactions). The core challenge is to translate raw behavioral logs into a system that drives marketing decisions.

**Data Source:** RetailRocket E-commerce Dataset (Kaggle).

## 1.2. Business Goal
The primary objective is to build an end-to-end **Customer Segmentation & Personalization System**. The system aims to replace "mass marketing" with data-driven targeting strategies.

**Key Deliverables:**

1.  **Segmentation Model:** Group customers based on RFM (Recency, Frequency, Monetary), and behavioral patterns (conversion rate, category of items) using K-Means.
2.  **CLV Prediction:** Estimate Customer Lifetime Value using probabilistic models (BG/NBD, Gamma-Gamma) to identify high-value users.
3.  **Actionable Rules (The Action Board):** A concrete mapping of "Segment -> Marketing Action" (e.g., *If VIP -> Grant Early Access*, If Prospect -> ..., If Hibernating ->...).
    * Define clear trigger conditions for each segment (e.g., VIP = High CLV & High Frequency)
    * Specify concrete marketing actions and channels (e.g., early-access email, personalized offers)
    * Propose measurable business KPIs for future evaluation (e.g., conversion rate, revenue per user).

4.  **Experimentation Framework:** Since the dataset is static and observational, direct A/B testing cannot be conducted. Instead, this project proposes an experimentation framework that could be applied in a real production environment.
The framework includes (không chắc) :
- Definition of control and treatment groups per segment.
- Suggested success metrics (e.g., uplift in conversion rate, incremental revenue).
- Guidelines for experiment duration and sample size considerations.
    * Note: Control Group: Not apply new strategies
5.  **Personalization Dashboard:** A visual storytelling tool including:
    * *Treemaps* for segment sizing.
    * *Line Charts* for product trends.
    * (thêm)

## 1.3. The DOC Framework (Decision - Options - Criteria)
To bridge the gap between technical models and business value:

### **D - Decision**
**"How to allocate the marketing budget and which promotional channel to use for each specific customer group?"**

**"When to allocate these strategies (seasonal or monthly)**

### **O - Options (Hypothetical Actions)**
* **VIP Zone – Loyal Customers**  
  (High Frequency & High Monetary):  
  Offer **Early Access / Exclusive Benefits** to maintain engagement while protecting profit margins.
* **Growth Zone – Recent & High Potential**  
  (Recent activity, moderate spend, high conversion tendency):  
  Trigger **Welcome or Second-Purchase Incentives** to encourage repeat purchases.
* **Low Value Zone – Low Spend & Low Loyalty**  
  (Low Frequency & Low Monetary, but still active):  
  Apply **Light Promotions or Bundle Offers** to increase purchase intent and basket size.
* **Hibernating Zone – At Risk Customers**  
  (Long time since last purchase, declining engagement):  
  Use **Win-back Campaigns with Time-Limited Vouchers** to reactivate customers.
* **Non-Transactors**  
  (Users with views/add-to-cart but no transactions):  
  Target with **acquisition-style tactics**: onboarding emails, first-purchase discounts, product discovery flows, and remarketing to convert to first purchase.

### **C – Criteria (Success Metrics)**

1. **Segmentation Quality (Interpretability & Stability):**  
   Customer segments should be clearly interpretable based on RFM and CLV characteristics, showing consistent and meaningful behavioral differences that can support business decision-making (rather than purely optimizing statistical metrics).

2. **Business Actionability (không chắc có cái này không):**  
   Each segment must be easily mapped to a distinct marketing strategy (e.g., retention, growth, reactivation), ensuring that the segmentation results are actionable in real-world marketing operations.

3. **Dashboard Effectiveness:**  
   The dashboard should clearly communicate the segmentation insights, segment sizes, and proposed actions, enabling stakeholders to quickly understand customer distribution and prioritize interventions.

4. **Experiment Readiness (Conceptual):**  
   The framework should define clear evaluation metrics (e.g., Conversion Rate, Retention Rate, Revenue per User) that could be used in future A/B testing to measure the effectiveness of personalized strategies.


## 1.4. KPI Tree & Metric Decomposition

Applying the **5-step KPI Tree process**, we structure the metrics based on the available data constraints.

### **Step 1: North Star Metric**
**Target:** **GMV (Gross Merchandise Value)**.
* *Clarification:* Since the dataset lacks data on cancellations, returns, and basket-level discounts, we optimize for GMV (total value of merchandise sold) as a proxy for Revenue under observed data.

### **Step 2: Metric Decomposition (Drivers)**
Decomposition Formula:
$$GMV = \text{Traffic} \times \text{Conversion Rate (CR)} \times \text{Average Ticket Size (ATS)}$$

*Note: We use "Average Ticket Size" instead of AOV because order-level (basket) identifiers are unavailable; calculations are item-based.*

#### **Driver 1: Traffic (Total Visitors)**
* **Metric:** Count of unique `visitorid`.
* *Scope:* Out of scope for this project (cannot influence traffic with this datasets).

#### **Driver 2: Conversion Rate (CR)**
* **Definition:** % of visitors who perform at least one `transaction`.
* **Formula:**
    $$CR = \frac{\text{Count(Visitors with ≥1 'transaction' event)}}{\text{Count(Total Visitors)}} \times 100\%$$
Note: Conversion Rate in the KPI Tree is defined at the visitor level, while funnel-level transition probabilities are used separately for behavioral analysis.
#### **Driver 3: Average Ticket Size (ATS) / Monetary**
* **Definition:** Average value generated per paying visitor (proxy for AOV).
* **Formula:**
    $$ATS \approx \frac{\sum (\text{Price of items in 'transaction'})}{\text{Count(Paying Visitors)}}$$

### **Step 3: Business Levers (Within Project Scope)**
* **Segmentation Lever – RFM & Behavioral Clustering (K-Means / GMM):**
    * *Purpose:* Group customers with similar purchase recency, frequency, and monetary patterns.
    * *Application:* Enable differentiated treatment instead of mass marketing.

* **Prioritization Lever – CLV Estimation:**
    * *Purpose:* Identify high-value customers and allocate personalization efforts accordingly.
    * *Application:* Focus retention and engagement strategies on users with high predicted lifetime value.

* **Personalization Lever – Rule-based Action Board:**
    * *Purpose:* Translate segments into concrete marketing actions.
    * *Application:* Define rules such as “If Segment = At Risk & CLV High → Retention-oriented messaging”.

* *Future Extension (Out of Scope):*
    * Recommendation systems could be layered on top of this framework to increase Average Ticket Size in future work.


---

### **Step 4: Guardrails (Measurement Assumptions)**
1. **Transaction Validity Assumption:**
    * All `transaction` events are treated as successful purchases at the recorded `item_price`.
    * Returns, cancellations, and refunds are not observable in the dataset.
2. **Fixed Time Window Assumption:**
    * All KPIs are computed within a fixed historical time window.
    * No temporal feedback loop or post-intervention effects are measured.

### **Step 5: Dashboard Visualization & Review Cadence**
* **Metric 1:** Total GMV over time (Line Chart) — North Star tracking.
* **Metric 2:** Segment Migration (Stacked Area — e.g., how many "New" users became "VIP").
* **Metric 3:** RFM Distribution (Heatmap or Scatter).
* *Review Cadence:* Periodic analytical review (weekly/monthly), not real-time monitoring.


Trong RetailRocket: transaction event(gắn với visitorid + itemid)

không có order_id
Không biết (một lần checkout gồm bao nhiêu item, item nào thuộc cùng một đơn)

Không thể xác định đâu là “1 order”.


Vì sao ATS hợp lý? Không cần order_id, Dựa hoàn toàn vào (transaction, item_price ,visitorid)
Phản ánh Monetary dimension trong RFM

ATS = proxy hợp lệ cho “mức chi tiêu”, không phải “giá trị mỗi đơn”.

## 1.5. Project Roadmap (CRISP-DM)

* **01_Business_Understanding:** Defining goals, DOC framework, and KPI Tree.
* **02_Data_Understanding:** Exploratory Data Analysis (EDA).
* **03_Data_Preparation:** Feature engineering (RFM) and Data Cleaning.
* **04.1_Modeling_Segmentation:** K-MeansClustering.
* **04.2_Modeling_CLV:** Probabilistic Prediction (BG/NBD + Gamma-Gamma).
* **05_Evaluation_Deployment:** Persona Profiling and Action Board.
* **06_Experimentation Framework & Dashboard:** 