# 04. RFM 세그먼테이션 실습

SQL로 직접 RFM 분석을 수행해봅니다.

**규칙:**
- 힌트는 최대한 안 보고 먼저 시도하세요
- 막히면 힌트를 펼쳐보세요
- 완료 후 `solution.sql`과 비교하세요

## 환경 설정

In [None]:
import sqlite3
import pandas as pd
from pathlib import Path

# 데이터베이스 연결
DB_PATH = Path("../data/crm.db")
conn = sqlite3.connect(DB_PATH)

# SQL 실행 헬퍼 함수
def sql(query):
    """SQL 쿼리 실행 및 결과 반환"""
    return pd.read_sql(query, conn)

print("데이터베이스 연결 완료!")

---
## 준비: 데이터 파악

In [None]:
# 거래 데이터 기간 및 규모 확인
sql("""
SELECT 
    MIN(transaction_date) as first_date,
    MAX(transaction_date) as last_date,
    COUNT(*) as total_transactions,
    COUNT(DISTINCT customer_id) as unique_customers
FROM transactions
""")

In [None]:
# 고객당 거래 분포 확인
sql("""
SELECT 
    MIN(order_count) as min_orders,
    MAX(order_count) as max_orders,
    AVG(order_count) as avg_orders,
    AVG(total_amount) as avg_amount
FROM (
    SELECT 
        customer_id,
        COUNT(*) as order_count,
        SUM(amount) as total_amount
    FROM transactions
    GROUP BY customer_id
)
""")

---
# Mission 1: RFM 지표 계산

각 고객의 R, F, M 값을 계산해봅시다.

## Mission 1-1: 고객별 RFM 원시값 계산

**문제:** 각 고객의 Recency, Frequency, Monetary를 계산하세요.

- **Recency**: 마지막 구매일로부터 경과한 일수 (기준일: 데이터 최신일)
- **Frequency**: 총 구매 횟수
- **Monetary**: 총 구매 금액

출력 컬럼: `customer_id`, `recency`, `frequency`, `monetary`

In [None]:
# TODO: 고객별 RFM 원시값을 계산하는 SQL을 작성하세요
sql("""
WITH max_date AS (
    SELECT MAX(transaction_date) as reference_date
    FROM transactions
)
SELECT 
    customer_id,
    julianday(m.reference_date) - julianday(MAX(transaction_date)) as recency,
    _____ as frequency,
    _____ as monetary
FROM transactions t, max_date m
GROUP BY customer_id
ORDER BY monetary DESC
LIMIT 10
""")

<details>
<summary>힌트 (클릭해서 펼치기)</summary>

- Frequency: `COUNT(*)` (총 구매 횟수)
- Monetary: `SUM(amount)` (총 구매 금액)
- SQLite에서 날짜 차이: `julianday(date1) - julianday(date2)`

</details>

---
# Mission 2: RFM 점수 부여

5분위로 나누어 점수를 부여해봅시다.

## Mission 2-1: NTILE을 사용한 점수 부여

**문제:** 각 지표를 5분위로 나누어 1-5점을 부여하세요.

주의:
- Recency는 값이 **작을수록** 좋음 (최근 구매) → 역순으로 점수
- Frequency, Monetary는 값이 **클수록** 좋음 → 정순으로 점수

출력 컬럼: `customer_id`, `recency`, `frequency`, `monetary`, `r_score`, `f_score`, `m_score`

In [None]:
# TODO: RFM 점수를 부여하는 SQL을 작성하세요
sql("""
WITH max_date AS (
    SELECT MAX(transaction_date) as reference_date
    FROM transactions
),
rfm_raw AS (
    SELECT 
        customer_id,
        julianday(m.reference_date) - julianday(MAX(transaction_date)) as recency,
        COUNT(*) as frequency,
        SUM(amount) as monetary
    FROM transactions t, max_date m
    GROUP BY customer_id
)
SELECT 
    customer_id,
    recency,
    frequency,
    monetary,
    -- R은 작을수록 좋으므로 역순 (ASC로 정렬 후 6-점수)
    6 - NTILE(5) OVER (ORDER BY recency _____) as r_score,
    -- F는 클수록 좋음 (DESC)
    NTILE(5) OVER (ORDER BY frequency _____) as f_score,
    -- M은 클수록 좋음 (DESC)
    NTILE(5) OVER (ORDER BY monetary _____) as m_score
FROM rfm_raw
ORDER BY r_score + f_score + m_score DESC
LIMIT 10
""")

<details>
<summary>힌트 (클릭해서 펼치기)</summary>

- R점수: `ORDER BY recency ASC` (작은 값이 높은 분위 → 6에서 빼기)
- F점수: `ORDER BY frequency DESC`
- M점수: `ORDER BY monetary DESC`
- 또는 R을 DESC로 정렬하고 빼기 없이 사용해도 됩니다

</details>

## Mission 2-2: RFM 점수 합계 및 조합

**문제:** RFM 점수의 합계와 문자열 조합(예: "555")을 계산하세요.

출력 컬럼: `customer_id`, `r_score`, `f_score`, `m_score`, `rfm_score` (문자열), `rfm_sum` (합계)

In [None]:
# TODO: RFM 점수 조합을 계산하는 SQL을 작성하세요
sql("""
WITH max_date AS (
    SELECT MAX(transaction_date) as reference_date FROM transactions
),
rfm_raw AS (
    SELECT 
        customer_id,
        julianday(m.reference_date) - julianday(MAX(transaction_date)) as recency,
        COUNT(*) as frequency,
        SUM(amount) as monetary
    FROM transactions t, max_date m
    GROUP BY customer_id
),
rfm_scores AS (
    SELECT 
        customer_id,
        recency,
        frequency,
        monetary,
        NTILE(5) OVER (ORDER BY recency ASC) as r_score,
        NTILE(5) OVER (ORDER BY frequency DESC) as f_score,
        NTILE(5) OVER (ORDER BY monetary DESC) as m_score
    FROM rfm_raw
)
SELECT 
    customer_id,
    r_score,
    f_score,
    m_score,
    _____ as rfm_score,
    _____ as rfm_sum
FROM rfm_scores
ORDER BY rfm_sum DESC
LIMIT 10
""")

<details>
<summary>힌트 (클릭해서 펼치기)</summary>

- RFM 문자열 조합: `CAST(r_score AS TEXT) || CAST(f_score AS TEXT) || CAST(m_score AS TEXT)`
- 또는 간단히: `r_score || f_score || m_score`
- RFM 합계: `r_score + f_score + m_score`

</details>

---
# Mission 3: 세그먼트 분류

RFM 점수를 기반으로 고객을 세그먼트로 분류해봅시다.

## Mission 3-1: 세그먼트 정의

**문제:** RFM 점수 조합에 따라 세그먼트를 할당하세요.

| 세그먼트 | 조건 |
|----------|------|
| Champions | R >= 4 AND F >= 4 AND M >= 4 |
| Loyal | F >= 4 |
| Potential Loyalist | R >= 4 AND F = 2 or 3 |
| New Customers | R >= 4 AND F = 1 |
| At Risk | R <= 2 AND F >= 3 |
| Cannot Lose | R <= 2 AND F >= 4 AND M >= 4 |
| Lost | R <= 2 AND F <= 2 |

In [None]:
# TODO: 세그먼트를 할당하는 SQL을 작성하세요
sql("""
WITH max_date AS (
    SELECT MAX(transaction_date) as reference_date FROM transactions
),
rfm_raw AS (
    SELECT 
        customer_id,
        julianday(m.reference_date) - julianday(MAX(transaction_date)) as recency,
        COUNT(*) as frequency,
        SUM(amount) as monetary
    FROM transactions t, max_date m
    GROUP BY customer_id
),
rfm_scores AS (
    SELECT 
        customer_id,
        recency,
        frequency,
        monetary,
        NTILE(5) OVER (ORDER BY recency ASC) as r_score,
        NTILE(5) OVER (ORDER BY frequency DESC) as f_score,
        NTILE(5) OVER (ORDER BY monetary DESC) as m_score
    FROM rfm_raw
)
SELECT 
    customer_id,
    r_score,
    f_score,
    m_score,
    r_score || f_score || m_score as rfm_score,
    CASE 
        WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
        WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'Cannot Lose'
        WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
        WHEN f_score >= 4 THEN _____
        WHEN r_score >= 4 AND f_score <= 2 THEN _____
        WHEN r_score >= 4 AND f_score BETWEEN 2 AND 3 THEN _____
        WHEN r_score <= 2 AND f_score <= 2 THEN _____
        ELSE 'Others'
    END as segment
FROM rfm_scores
ORDER BY r_score + f_score + m_score DESC
LIMIT 20
""")

<details>
<summary>힌트 (클릭해서 펼치기)</summary>

- Loyal: `'Loyal'`
- New Customers: `'New Customers'`
- Potential Loyalist: `'Potential Loyalist'`
- Lost: `'Lost'`

CASE 문의 순서가 중요합니다! 더 구체적인 조건을 먼저 배치하세요.

</details>

## Mission 3-2: 세그먼트별 요약 통계

**문제:** 각 세그먼트별로 고객 수, 평균 매출, 총 매출 비중을 계산하세요.

출력 컬럼: `segment`, `customer_count`, `avg_monetary`, `total_monetary`, `revenue_share`

In [None]:
# TODO: 세그먼트별 요약 통계를 계산하는 SQL을 작성하세요
sql("""
WITH max_date AS (
    SELECT MAX(transaction_date) as reference_date FROM transactions
),
rfm_raw AS (
    SELECT 
        customer_id,
        julianday(m.reference_date) - julianday(MAX(transaction_date)) as recency,
        COUNT(*) as frequency,
        SUM(amount) as monetary
    FROM transactions t, max_date m
    GROUP BY customer_id
),
rfm_scores AS (
    SELECT 
        customer_id,
        recency,
        frequency,
        monetary,
        NTILE(5) OVER (ORDER BY recency ASC) as r_score,
        NTILE(5) OVER (ORDER BY frequency DESC) as f_score,
        NTILE(5) OVER (ORDER BY monetary DESC) as m_score
    FROM rfm_raw
),
rfm_segments AS (
    SELECT 
        customer_id,
        monetary,
        CASE 
            WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champions'
            WHEN r_score <= 2 AND f_score >= 4 AND m_score >= 4 THEN 'Cannot Lose'
            WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
            WHEN f_score >= 4 THEN 'Loyal'
            WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers'
            WHEN r_score >= 4 AND f_score BETWEEN 2 AND 3 THEN 'Potential Loyalist'
            WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost'
            ELSE 'Others'
        END as segment
    FROM rfm_scores
),
total_revenue AS (
    SELECT SUM(monetary) as total FROM rfm_segments
)
SELECT 
    segment,
    COUNT(*) as customer_count,
    ROUND(AVG(monetary), 0) as avg_monetary,
    ROUND(SUM(monetary), 0) as total_monetary,
    ROUND(_____ * 100, 2) as revenue_share
FROM rfm_segments, total_revenue
GROUP BY segment
ORDER BY revenue_share DESC
""")

<details>
<summary>힌트 (클릭해서 펼치기)</summary>

- Revenue share: `SUM(monetary) * 1.0 / total`
- total은 total_revenue 서브쿼리에서 가져옵니다

</details>

---
# Mission 4: 파레토 법칙 검증

상위 20% 고객이 전체 매출의 몇 %를 차지하는지 확인해봅시다.

## Mission 4-1: 상위 N% 고객의 매출 비중

**문제:** 상위 20% 고객이 전체 매출의 몇 %를 차지하는지 계산하세요.

(파레토 법칙: 상위 20% 고객이 매출의 80%를 차지)

In [None]:
# TODO: 상위 20% 고객의 매출 비중을 계산하는 SQL을 작성하세요
sql("""
WITH customer_monetary AS (
    SELECT 
        customer_id,
        SUM(amount) as monetary
    FROM transactions
    GROUP BY customer_id
),
ranked AS (
    SELECT 
        customer_id,
        monetary,
        NTILE(5) OVER (ORDER BY monetary DESC) as quintile
    FROM customer_monetary
),
total AS (
    SELECT SUM(monetary) as total_revenue FROM customer_monetary
)
SELECT 
    'Top 20%' as segment,
    COUNT(*) as customers,
    ROUND(SUM(r.monetary), 0) as revenue,
    ROUND(_____ * 100, 2) as revenue_share
FROM ranked r, total t
WHERE quintile = 1
""")

<details>
<summary>힌트 (클릭해서 펼치기)</summary>

- Revenue share: `SUM(r.monetary) * 1.0 / t.total_revenue`
- quintile = 1은 상위 20%입니다

</details>

## Mission 4-2: 누적 매출 비중

**문제:** 각 분위별 누적 매출 비중을 계산하세요.

출력: 상위 20% → 상위 40% → 상위 60% → ... 순으로 누적

In [None]:
# TODO: 누적 매출 비중을 계산하는 SQL을 작성하세요
sql("""
WITH customer_monetary AS (
    SELECT 
        customer_id,
        SUM(amount) as monetary
    FROM transactions
    GROUP BY customer_id
),
ranked AS (
    SELECT 
        customer_id,
        monetary,
        NTILE(5) OVER (ORDER BY monetary DESC) as quintile
    FROM customer_monetary
),
quintile_summary AS (
    SELECT 
        quintile,
        COUNT(*) as customers,
        SUM(monetary) as revenue
    FROM ranked
    GROUP BY quintile
),
total AS (
    SELECT SUM(revenue) as total_revenue FROM quintile_summary
)
SELECT 
    'Top ' || (quintile * 20) || '%' as segment,
    SUM(q.customers) OVER (ORDER BY q.quintile) as cumulative_customers,
    ROUND(SUM(q.revenue) OVER (ORDER BY q.quintile), 0) as cumulative_revenue,
    ROUND(SUM(q.revenue) OVER (ORDER BY q.quintile) * 100.0 / t.total_revenue, 2) as cumulative_share
FROM quintile_summary q, total t
ORDER BY q.quintile
""")

**생각해보기:**
- 파레토 법칙(80/20)이 적용되나요?
- 상위 고객에게 더 집중해야 할까요, 하위 고객을 올려야 할까요?

---
# Mission 5: 실무 시나리오

**시나리오:**

> 마케팅팀에서 고객 세그먼트별 맞춤 캠페인을 기획하려고 합니다.
> RFM 분석 결과를 바탕으로 다음을 도출하세요:
>
> 1. 가장 가치 있는 세그먼트와 그 특성
> 2. 즉시 액션이 필요한 세그먼트 (At Risk, Cannot Lose)
> 3. 세그먼트별 마케팅 권고안

**당신의 분석을 아래에 작성하세요:**

In [None]:
# 자유롭게 분석 쿼리를 작성하세요

# 세그먼트별 상세 분석
sql("""
-- 여기에 쿼리 작성

""")

### 나의 분석 결과 및 권고안

(여기에 분석 결과를 작성하세요)

1. **가장 가치 있는 세그먼트**: 
2. **즉시 액션 필요 세그먼트**: 
3. **세그먼트별 마케팅 권고안**:
   - Champions: 
   - Loyal: 
   - At Risk: 
   - Lost: 

---
# 회고

실습을 마치며 아래 질문에 답해보세요.

1. **어떤 부분이 어려웠나요?**

2. **새롭게 배운 SQL 기법은?**

3. **RFM 분석을 실무에서 어떻게 활용하겠습니까?**

4. **면접에서 이 내용을 어떻게 설명하겠습니까?**

In [None]:
# 데이터베이스 연결 종료
conn.close()
print("실습 완료!")