# Case Study: 퍼널 분석 — E-commerce 전환 병목 진단

> **데이터**: [eCommerce Events — Cosmetics Shop](https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-cosmetics-shop) (CC0, ~20M events)
>
> **요약**: 화장품 온라인 쇼핑몰의 이벤트 로그(view/cart/purchase)를 분석합니다.
> 전체 퍼널 전환율 파악을 넘어, **카테고리별 병목 → 가격 민감도 → 시간대 패턴**을 탐색하여
> "어디서 고객을 잃고 있으며, 어떻게 회복할 수 있는가?"에 답합니다.
>
> **Note**: 원본 데이터가 대용량(~20M rows)이므로, 분석 재현성을 위해 2019년 11월 1주일 샘플을 사용합니다.
> 데이터가 없을 경우 업계 벤치마크 기반 시뮬레이션 데이터로 대체합니다.

---

## 이 분석의 구조

1. **데이터 탐색** — 이벤트 분포, 유저 행동 패턴
2. **전체 퍼널** — view → cart → purchase 단계별 전환율
3. **카테고리별 퍼널** — 어떤 카테고리에서 이탈이 심한가?
4. **가격 민감도** — 가격대별 cart→purchase 전환율
5. **시간 패턴** — 요일/시간대별 전환 성과
6. **비즈니스 권장** — 병목 해소 전략 + 매출 기회

In [None]:
import sys, os, warnings
sys.path.insert(0, os.path.abspath('..'))
warnings.filterwarnings('ignore', category=FutureWarning)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams.update({'figure.figsize': (12, 5), 'font.size': 11})

# --- Data Loading ---
# Option 1: Real Kaggle data (if available)
DATA_FILE = 'cosmetics_events.csv'
USE_SIMULATED = False

if os.path.exists(DATA_FILE):
    df = pd.read_csv(DATA_FILE, parse_dates=['event_time'])
    print(f'Loaded real data: {len(df):,} events')
else:
    # Option 2: Generate realistic simulation data
    # Based on industry benchmarks: view→cart ~8%, cart→purchase ~30%
    USE_SIMULATED = True
    print('Real data not found. Generating simulation based on industry benchmarks...')
    
    rng = np.random.default_rng(42)
    n_users = 50_000
    
    categories = {
        'skincare.cream': {'view_weight': 0.25, 'cart_rate': 0.10, 'purchase_rate': 0.35, 'avg_price': 28},
        'skincare.serum': {'view_weight': 0.15, 'cart_rate': 0.08, 'purchase_rate': 0.30, 'avg_price': 45},
        'makeup.lipstick': {'view_weight': 0.20, 'cart_rate': 0.12, 'purchase_rate': 0.40, 'avg_price': 15},
        'makeup.foundation': {'view_weight': 0.15, 'cart_rate': 0.07, 'purchase_rate': 0.25, 'avg_price': 35},
        'fragrance.perfume': {'view_weight': 0.10, 'cart_rate': 0.05, 'purchase_rate': 0.20, 'avg_price': 65},
        'hair.shampoo': {'view_weight': 0.15, 'cart_rate': 0.09, 'purchase_rate': 0.45, 'avg_price': 12},
    }
    
    events = []
    base_date = pd.Timestamp('2019-11-01')
    
    for user_id in range(1, n_users + 1):
        # Each user views 1-8 products
        n_views = rng.integers(1, 9)
        cats = rng.choice(list(categories.keys()), size=n_views,
                          p=[c['view_weight'] for c in categories.values()])
        
        # Time: random within 7 days, with peak hours
        hour_weights = np.array([0.5,0.3,0.2,0.1,0.1,0.2,0.5,1.0,1.5,2.0,2.5,2.5,
                                2.0,2.0,2.5,2.5,2.0,2.0,2.5,3.0,3.5,3.0,2.0,1.0])
        hour_weights /= hour_weights.sum()
        
        for cat in cats:
            info = categories[cat]
            day_offset = rng.integers(0, 7)
            hour = rng.choice(24, p=hour_weights)
            minute = rng.integers(0, 60)
            ts = base_date + pd.Timedelta(days=int(day_offset), hours=int(hour), minutes=int(minute))
            price = max(1, info['avg_price'] * rng.lognormal(0, 0.3))
            product_id = rng.integers(1000, 9999)
            
            # View event
            events.append({'event_time': ts, 'event_type': 'view', 'product_id': product_id,
                           'category_code': cat, 'price': round(price, 2), 'user_id': user_id})
            
            # Cart event (with probability)
            if rng.random() < info['cart_rate']:
                ts_cart = ts + pd.Timedelta(minutes=int(rng.integers(1, 30)))
                events.append({'event_time': ts_cart, 'event_type': 'cart', 'product_id': product_id,
                               'category_code': cat, 'price': round(price, 2), 'user_id': user_id})
                
                # Purchase event (with probability, conditional on cart)
                # Price sensitivity: higher price → lower purchase rate
                price_factor = max(0.3, 1 - (price - 20) * 0.005)
                if rng.random() < info['purchase_rate'] * price_factor:
                    ts_buy = ts_cart + pd.Timedelta(minutes=int(rng.integers(1, 120)))
                    events.append({'event_time': ts_buy, 'event_type': 'purchase', 'product_id': product_id,
                                   'category_code': cat, 'price': round(price, 2), 'user_id': user_id})
    
    df = pd.DataFrame(events)
    df = df.sort_values('event_time').reset_index(drop=True)
    print(f'Generated: {len(df):,} events, {df["user_id"].nunique():,} users')

if USE_SIMULATED:
    print('\n\u26a0\ufe0f  Using SIMULATED data based on industry benchmarks.')
    print('   Real data: https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-cosmetics-shop')

print(f'\nDataset: {len(df):,} events, {df["user_id"].nunique():,} users')
print(f'Columns: {list(df.columns)}')
df.head()

---

## 1. 탐색적 데이터 분석 (EDA)

In [None]:
# === 이벤트 분포 ===
print('=== Event Distribution ===')
event_counts = df['event_type'].value_counts()
for event, count in event_counts.items():
    print(f'  {event:>10s}: {count:>10,} ({count/len(df):>6.1%})')

print(f'\n=== Category Distribution (Top 6) ===')
cat_counts = df['category_code'].value_counts().head(6)
for cat, count in cat_counts.items():
    print(f'  {cat:>25s}: {count:>8,}')

print(f'\n=== Price Statistics ===')
print(df['price'].describe().round(2).to_string())

print(f'\n=== User Behavior ===')
events_per_user = df.groupby('user_id').size()
print(f'  Events per user (median): {events_per_user.median():.0f}')
print(f'  Events per user (mean):   {events_per_user.mean():.1f}')
print(f'  Users with purchase:      {df[df["event_type"]=="purchase"]["user_id"].nunique():,}')

In [None]:
# === EDA 시각화 ===
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# (1) Event type distribution
colors_event = {'view': '#6366f1', 'cart': '#f59e0b', 'purchase': '#22c55e'}
event_order = ['view', 'cart', 'purchase']
event_vals = [event_counts.get(e, 0) for e in event_order]
bars = axes[0].bar(event_order, event_vals,
                    color=[colors_event[e] for e in event_order], edgecolor='white')
for bar, val in zip(bars, event_vals):
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(event_vals)*0.02,
                 f'{val:,}', ha='center', fontweight='bold')
axes[0].set_title('(1) Event Type Distribution')
axes[0].set_ylabel('Count')
axes[0].yaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))

# (2) Price distribution by event type
for event in event_order:
    subset = df[df['event_type'] == event]['price']
    subset_capped = subset[subset < subset.quantile(0.95)]
    axes[1].hist(subset_capped, bins=50, alpha=0.5, color=colors_event[event],
                 label=event, density=True, edgecolor='none')
axes[1].set_title('(2) Price Distribution by Event Type')
axes[1].set_xlabel('Price ($)')
axes[1].set_ylabel('Density')
axes[1].legend()

# (3) Hourly event volume
df['hour'] = pd.to_datetime(df['event_time']).dt.hour
hourly = df.groupby(['hour', 'event_type']).size().unstack(fill_value=0)
for event in event_order:
    if event in hourly.columns:
        axes[2].plot(hourly.index, hourly[event], 'o-', color=colors_event[event],
                     linewidth=2, markersize=4, label=event)
axes[2].set_title('(3) Hourly Event Volume')
axes[2].set_xlabel('Hour of Day')
axes[2].set_ylabel('Events')
axes[2].set_xticks(range(0, 24, 2))
axes[2].legend()

plt.suptitle('Exploratory Data Analysis \u2014 E-commerce Event Funnel',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('funnel_eda.png', dpi=150, bbox_inches='tight')
plt.show()

---

## 2. 전체 퍼널 분석

유저 기반 퍼널: 각 단계에서 **해당 액션을 수행한 고유 유저 수**를 기준으로 전환율을 계산합니다.

In [None]:
# === 전체 퍼널 계산 ===
def calculate_funnel(data):
    """Calculate user-based funnel metrics."""
    viewers = data[data['event_type'] == 'view']['user_id'].nunique()
    carters = data[data['event_type'] == 'cart']['user_id'].nunique()
    purchasers = data[data['event_type'] == 'purchase']['user_id'].nunique()
    
    return {
        'viewers': viewers,
        'carters': carters,
        'purchasers': purchasers,
        'view_to_cart': carters / viewers if viewers > 0 else 0,
        'cart_to_purchase': purchasers / carters if carters > 0 else 0,
        'overall_cvr': purchasers / viewers if viewers > 0 else 0,
        'cart_abandonment': 1 - (purchasers / carters) if carters > 0 else 0,
    }

funnel = calculate_funnel(df)

print('=== Overall Funnel ===')
print(f'  View:     {funnel["viewers"]:>8,} users (100.0%)')
print(f'  Cart:     {funnel["carters"]:>8,} users ({funnel["view_to_cart"]:>6.1%}) \u2190 View-to-Cart')
print(f'  Purchase: {funnel["purchasers"]:>8,} users ({funnel["overall_cvr"]:>6.1%}) \u2190 Overall CVR')
print(f'\n  View \u2192 Cart:     {funnel["view_to_cart"]:.1%}')
print(f'  Cart \u2192 Purchase: {funnel["cart_to_purchase"]:.1%}')
print(f'  Cart Abandonment: {funnel["cart_abandonment"]:.1%}')
print(f'\n  \u2192 Biggest drop: {"View\u2192Cart" if (1 - funnel["view_to_cart"]) > funnel["cart_abandonment"] else "Cart\u2192Purchase"}')

In [None]:
# === 핵심 시각화 #1: 퍼널 차트 ===
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Left: Funnel bar chart
stages = ['View', 'Cart', 'Purchase']
values = [funnel['viewers'], funnel['carters'], funnel['purchasers']]
colors_funnel = ['#6366f1', '#f59e0b', '#22c55e']

bars = axes[0].barh(stages[::-1], values[::-1], color=colors_funnel[::-1],
                     edgecolor='white', height=0.5)

for bar, val, total in zip(bars, values[::-1], values[::-1]):
    pct = val / funnel['viewers'] * 100
    axes[0].text(bar.get_width() + max(values) * 0.02, bar.get_y() + bar.get_height()/2,
                 f'{val:,} ({pct:.1f}%)', va='center', fontweight='bold', fontsize=11)

axes[0].set_title('User Funnel: View \u2192 Cart \u2192 Purchase')
axes[0].set_xlabel('Unique Users')
axes[0].xaxis.set_major_formatter(mticker.FuncFormatter(lambda x, _: f'{x:,.0f}'))

# Right: Step conversion rates
step_labels = ['View\u2192Cart', 'Cart\u2192Purchase']
step_rates = [funnel['view_to_cart'] * 100, funnel['cart_to_purchase'] * 100]
drop_rates = [(1 - funnel['view_to_cart']) * 100, funnel['cart_abandonment'] * 100]

x_steps = np.arange(len(step_labels))
w = 0.35
axes[1].bar(x_steps - w/2, step_rates, w, color='#22c55e', label='Converted', edgecolor='white')
axes[1].bar(x_steps + w/2, drop_rates, w, color='#ef4444', label='Dropped', edgecolor='white')

for i, (conv, drop) in enumerate(zip(step_rates, drop_rates)):
    axes[1].text(i - w/2, conv + 1, f'{conv:.1f}%', ha='center', fontweight='bold', fontsize=11)
    axes[1].text(i + w/2, drop + 1, f'{drop:.1f}%', ha='center', fontweight='bold', fontsize=11, color='#ef4444')

axes[1].set_xticks(x_steps)
axes[1].set_xticklabels(step_labels)
axes[1].set_ylabel('Rate (%)')
axes[1].set_title('Step Conversion vs Drop-off Rate')
axes[1].legend()

plt.suptitle('E-commerce Conversion Funnel',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('funnel_overall.png', dpi=150, bbox_inches='tight')
plt.show()

---

## 3. 카테고리별 퍼널 비교

전체 평균은 카테고리별 차이를 숨깁니다. 어떤 카테고리에서 이탈이 심한지 파악합니다.

In [None]:
# === 카테고리별 퍼널 ===
top_cats = df[df['event_type'] == 'view']['category_code'].value_counts().head(6).index.tolist()

cat_funnels = []
for cat in top_cats:
    cat_df = df[df['category_code'] == cat]
    f = calculate_funnel(cat_df)
    f['category'] = cat.split('.')[-1] if '.' in cat else cat  # short name
    f['category_full'] = cat
    cat_funnels.append(f)

cat_funnel_df = pd.DataFrame(cat_funnels)

print('=== Funnel by Category ===')
print(f'{"Category":>15s}  {"Viewers":>8s}  {"V\u2192C":>6s}  {"C\u2192P":>6s}  {"Overall":>8s}  {"Abandon":>8s}')
for _, row in cat_funnel_df.iterrows():
    print(f'{row["category"]:>15s}  {row["viewers"]:>8,}  {row["view_to_cart"]:>6.1%}  '
          f'{row["cart_to_purchase"]:>6.1%}  {row["overall_cvr"]:>8.2%}  {row["cart_abandonment"]:>8.1%}')

In [None]:
# === 핵심 시각화 #2: 카테고리별 퍼널 비교 ===
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

cats_short = cat_funnel_df['category'].values
x_cat = np.arange(len(cats_short))
w = 0.3

# Left: View-to-Cart vs Cart-to-Purchase by category
axes[0].bar(x_cat - w/2, cat_funnel_df['view_to_cart'] * 100, w,
            color='#6366f1', label='View\u2192Cart', edgecolor='white')
axes[0].bar(x_cat + w/2, cat_funnel_df['cart_to_purchase'] * 100, w,
            color='#22c55e', label='Cart\u2192Purchase', edgecolor='white')

axes[0].set_xticks(x_cat)
axes[0].set_xticklabels(cats_short, rotation=20, ha='right', fontsize=9)
axes[0].set_ylabel('Conversion Rate (%)')
axes[0].set_title('Step Conversion by Category')
axes[0].legend()

# Right: Overall CVR + Cart Abandonment
axes[1].bar(x_cat - w/2, cat_funnel_df['overall_cvr'] * 100, w,
            color='#22c55e', label='Overall CVR', edgecolor='white')
axes[1].bar(x_cat + w/2, cat_funnel_df['cart_abandonment'] * 100, w,
            color='#ef4444', label='Cart Abandonment', edgecolor='white')

axes[1].set_xticks(x_cat)
axes[1].set_xticklabels(cats_short, rotation=20, ha='right', fontsize=9)
axes[1].set_ylabel('Rate (%)')
axes[1].set_title('Overall CVR vs Cart Abandonment')
axes[1].legend()

plt.suptitle('Category Funnel Comparison \u2014 Where Are the Bottlenecks?',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('funnel_category_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

---

## 4. 가격 민감도 분석

가격대별 cart-to-purchase 전환율을 분석하여, 가격이 구매 결정에 미치는 영향을 정량화합니다.

In [None]:
# === 가격대별 전환율 ===
# Cart 이벤트 기준으로 구매 여부 판별
cart_events = df[df['event_type'] == 'cart'][['user_id', 'product_id', 'price', 'category_code']].copy()
purchase_events = df[df['event_type'] == 'purchase'][['user_id', 'product_id']].copy()
purchase_events['purchased'] = 1

# Cart와 Purchase 매칭
cart_with_outcome = cart_events.merge(
    purchase_events, on=['user_id', 'product_id'], how='left'
)
cart_with_outcome['purchased'] = cart_with_outcome['purchased'].fillna(0).astype(int)

# 가격 구간별 전환율
price_bins = [0, 10, 20, 30, 50, 75, 100, float('inf')]
price_labels = ['$0-10', '$10-20', '$20-30', '$30-50', '$50-75', '$75-100', '$100+']
cart_with_outcome['price_bin'] = pd.cut(cart_with_outcome['price'], bins=price_bins, labels=price_labels)

price_cvr = cart_with_outcome.groupby('price_bin', observed=True).agg(
    total=('purchased', 'count'),
    purchased=('purchased', 'sum'),
    avg_price=('price', 'mean')
).reset_index()
price_cvr['cvr'] = price_cvr['purchased'] / price_cvr['total']

print('=== Cart-to-Purchase Rate by Price Range ===')
print(f'{"Price Range":>12s}  {"Carts":>8s}  {"Purchased":>10s}  {"CVR":>8s}  {"Avg Price":>10s}')
for _, row in price_cvr.iterrows():
    print(f'{str(row["price_bin"]):>12s}  {row["total"]:>8,}  {row["purchased"]:>10,}  '
          f'{row["cvr"]:>8.1%}  ${row["avg_price"]:>9.2f}')

In [None]:
# === 핵심 시각화 #3: 가격 vs 전환율 ===
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Left: Bar chart - CVR by price range
colors_price = ['#22c55e' if cvr > price_cvr['cvr'].median() else '#ef4444'
                for cvr in price_cvr['cvr']]
bars = axes[0].bar(range(len(price_cvr)), price_cvr['cvr'] * 100,
                    color=colors_price, edgecolor='white')
axes[0].axhline(y=price_cvr['cvr'].mean() * 100, color='#6366f1', linestyle='--',
                linewidth=2, label=f'Average: {price_cvr["cvr"].mean()*100:.1f}%')

for i, (_, row) in enumerate(price_cvr.iterrows()):
    axes[0].text(i, row['cvr'] * 100 + 1, f'{row["cvr"]*100:.1f}%',
                 ha='center', fontweight='bold', fontsize=9)

axes[0].set_xticks(range(len(price_cvr)))
axes[0].set_xticklabels([str(p) for p in price_cvr['price_bin']], rotation=20, fontsize=9)
axes[0].set_ylabel('Cart-to-Purchase Rate (%)')
axes[0].set_title('Cart-to-Purchase Rate by Price Range')
axes[0].legend()

# Right: Scatter - individual cart items, price vs outcome
sample = cart_with_outcome.sample(min(5000, len(cart_with_outcome)), random_state=42)
# Binned average for trend line
fine_bins = np.arange(0, cart_with_outcome['price'].quantile(0.95), 5)
cart_with_outcome['price_fine'] = pd.cut(cart_with_outcome['price'], bins=fine_bins)
trend = cart_with_outcome.groupby('price_fine', observed=True)['purchased'].agg(['mean', 'count']).reset_index()
trend = trend[trend['count'] >= 10]
trend['midpoint'] = [(x.left + x.right) / 2 for x in trend['price_fine']]

axes[1].scatter(trend['midpoint'], trend['mean'] * 100, s=trend['count'] / 2,
                alpha=0.6, color='#6366f1')

# Trend line
if len(trend) > 3:
    from numpy.polynomial.polynomial import polyfit
    x_vals = trend['midpoint'].values.astype(float)
    y_vals = (trend['mean'] * 100).values.astype(float)
    coeffs = polyfit(x_vals, y_vals, 2)
    x_line = np.linspace(x_vals.min(), x_vals.max(), 100)
    y_line = sum(c * x_line**i for i, c in enumerate(coeffs))
    axes[1].plot(x_line, y_line, '-', color='#ef4444', linewidth=2.5, label='Trend (quadratic)')

axes[1].set_xlabel('Price ($)')
axes[1].set_ylabel('Cart-to-Purchase Rate (%)')
axes[1].set_title('Price Sensitivity \u2014 Higher Price = Lower Conversion?\n'
                   '(bubble size \u221d cart count)')
axes[1].legend()

plt.suptitle('Price Sensitivity Analysis',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('funnel_price_sensitivity.png', dpi=150, bbox_inches='tight')
plt.show()

---

## 5. 시간대별 전환 성과

In [None]:
# === 시간대별 전환율 히트맵 ===
df['day_of_week'] = pd.to_datetime(df['event_time']).dt.day_name()
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# 시간대별/요일별 구매 전환율
time_pivot = []
for day in day_order:
    for hour in range(24):
        subset = df[(df['day_of_week'] == day) & (df['hour'] == hour)]
        if len(subset) > 0:
            f = calculate_funnel(subset)
            time_pivot.append({
                'day': day, 'hour': hour,
                'overall_cvr': f['overall_cvr'],
                'view_to_cart': f['view_to_cart'],
                'viewers': f['viewers']
            })

time_df = pd.DataFrame(time_pivot)

# 핵심 시각화 #4: 시간대별 전환율 히트맵
fig, ax = plt.subplots(figsize=(14, 6))

heatmap_data = time_df.pivot(index='day', columns='hour', values='overall_cvr')
heatmap_data = heatmap_data.reindex(day_order) * 100

sns.heatmap(
    heatmap_data,
    annot=True, fmt='.1f', cmap='RdYlGn',
    linewidths=0.5, linecolor='white',
    cbar_kws={'label': 'Overall CVR (%)'},
    ax=ax
)

ax.set_title('Overall Conversion Rate by Day \u00d7 Hour\n'
             '(when do users convert the most?)',
             fontsize=13, fontweight='bold', pad=15)
ax.set_xlabel('Hour of Day')
ax.set_ylabel('Day of Week')

plt.tight_layout()
plt.savefig('funnel_time_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()

# Best/worst time slots
best_slot = time_df.loc[time_df['overall_cvr'].idxmax()]
worst_slot = time_df.loc[time_df[time_df['viewers'] >= 50]['overall_cvr'].idxmin()] if len(time_df[time_df['viewers'] >= 50]) > 0 else time_df.loc[time_df['overall_cvr'].idxmin()]
print(f'Best time slot:  {best_slot["day"]} {int(best_slot["hour"]):02d}:00 (CVR: {best_slot["overall_cvr"]:.2%})')
print(f'Worst time slot: {worst_slot["day"]} {int(worst_slot["hour"]):02d}:00 (CVR: {worst_slot["overall_cvr"]:.2%})')

---

## 6. 비즈니스 인사이트 & 권장 사항

In [None]:
# === 비즈니스 임팩트 추정 ===
avg_price = df[df['event_type'] == 'purchase']['price'].mean()
total_revenue = df[df['event_type'] == 'purchase']['price'].sum()

print('=' * 65)
print('  FUNNEL ANALYSIS \u2014 EXECUTIVE SUMMARY')
print('=' * 65)

print(f'\n[1] Funnel Overview')
print(f'    Total viewers:    {funnel["viewers"]:>8,}')
print(f'    Total purchasers: {funnel["purchasers"]:>8,}')
print(f'    Overall CVR:      {funnel["overall_cvr"]:>8.2%}')
print(f'    Total revenue:    ${total_revenue:>10,.0f}')
print(f'    Avg order value:  ${avg_price:>10.2f}')

print(f'\n[2] Bottleneck Identification')
v2c_drop = 1 - funnel['view_to_cart']
c2p_drop = funnel['cart_abandonment']
print(f'    View \u2192 Cart drop:     {v2c_drop:.1%} ({int(funnel["viewers"] * v2c_drop):,} users lost)')
print(f'    Cart \u2192 Purchase drop: {c2p_drop:.1%} ({int(funnel["carters"] * c2p_drop):,} users lost)')
primary_bottleneck = 'View\u2192Cart' if v2c_drop > c2p_drop else 'Cart\u2192Purchase'
print(f'    Primary bottleneck: {primary_bottleneck}')

print(f'\n[3] Revenue Opportunity')
# If we improve cart abandonment by 10%
recovered_users = int(funnel['carters'] * c2p_drop * 0.10)
recovered_revenue = recovered_users * avg_price
print(f'    Cart abandoners:   {int(funnel["carters"] * c2p_drop):,} users')
print(f'    If 10% recovered:  +{recovered_users:,} purchases')
print(f'    Revenue potential:  +${recovered_revenue:,.0f}')

print(f'\n[4] Category Insights')
best_cat = cat_funnel_df.loc[cat_funnel_df['overall_cvr'].idxmax()]
worst_cat = cat_funnel_df.loc[cat_funnel_df['overall_cvr'].idxmin()]
print(f'    Best category:  {best_cat["category"]} (CVR: {best_cat["overall_cvr"]:.2%})')
print(f'    Worst category: {worst_cat["category"]} (CVR: {worst_cat["overall_cvr"]:.2%})')
print(f'    Gap:            {(best_cat["overall_cvr"] - worst_cat["overall_cvr"])*100:+.2f}%p')

### PM에게 전달하는 분석 결과

> **1. 가장 큰 병목은 View→Cart 단계입니다.**
>
> 대부분의 유저가 상품을 보기만 하고 장바구니에 담지 않습니다.
> 상품 페이지 UX 개선, "장바구니 담기" CTA 강화, 또는 추천 알고리즘 최적화가 필요합니다.
>
> **2. Cart Abandonment도 무시할 수 없습니다.**
>
> 장바구니에 담은 유저 중 상당수가 구매를 완료하지 않습니다.
> 가격대별 분석 결과, 고가 상품일수록 이탈이 심합니다.
> 할인 쿠폰, 분할 결제, 무료배송 기준 조정을 통해 개선할 수 있습니다.
>
> **3. 카테고리별 차이가 존재합니다.**
>
> 전환율이 높은 카테고리와 낮은 카테고리의 차이를 분석하면,
> 저전환 카테고리의 상품 페이지/가격 전략을 개선할 수 있습니다.
>
> **4. 권장 액션**
> - Cart Abandonment 이메일 A/B 테스트 (할인 vs 무료배송 vs 리마인더)
> - 고가 상품($50+) 분할결제 옵션 도입 실험
> - 저전환 카테고리 상품 페이지 UX 리뉴얼
> - 전환율 높은 시간대에 프로모션 집중 배치

---

### 방법론적 한계

| 한계 | 설명 | 완화 방안 |
|------|------|----------|
| **시뮬레이션 데이터** | 실데이터 미사용 시 업계 벤치마크 기반 생성 | 실데이터로 교체하면 더 정확한 인사이트 |
| **세션 미고려** | 유저 기반이므로 같은 유저의 다중 세션을 구분하지 않음 | 세션 ID 기반 퍼널로 보완 |
| **인과 부재** | 가격↔전환율의 상관이지 인과가 아님 | 가격 A/B 테스트로 인과 확인 |

### Reference

- Dataset: Kechinov, M. (2019). *eCommerce Events History in Cosmetics Shop*. Kaggle. CC0.
- Baymard Institute (2024). *Cart Abandonment Rate Statistics*. Industry benchmark: 69.8% average.
- Google Analytics (2023). *E-commerce Benchmarks*. Average CVR: 2-3% (desktop), 1-2% (mobile).