# 02 - Revenue Deep-Dive and Trend Analysis

Comprehensive revenue analysis for the Olist e-commerce marketplace.

**Analyses Covered:**
1. Monthly/Quarterly Revenue Trends with MoM Growth
2. Revenue Decomposition: New vs Returning Customers
3. Revenue by Product Category (Pareto Analysis)
4. Revenue by Geographic Region (State)
5. Average Order Value (AOV) Trends
6. Payment Method Analysis

---

## Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings

warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

# Color palette for consistent branding
COLORS = {
    'primary': '#2E86AB',
    'secondary': '#A23B72',
    'success': '#28A745',
    'danger': '#DC3545',
    'warning': '#FFC107',
    'info': '#17A2B8',
    'new_customer': '#2E86AB',
    'returning_customer': '#A23B72'
}

# Create images directory if not exists
images_dir = Path('../images')
images_dir.mkdir(exist_ok=True)

print("Setup complete.")

In [None]:
# Connect to database and load data
conn = sqlite3.connect('../data/olist_ecommerce.db')

# Load all necessary tables
orders = pd.read_sql("SELECT * FROM orders", conn)
order_items = pd.read_sql("SELECT * FROM order_items", conn)
order_payments = pd.read_sql("SELECT * FROM order_payments", conn)
customers = pd.read_sql("SELECT * FROM customers", conn)
products = pd.read_sql("SELECT * FROM products", conn)
product_category_translation = pd.read_sql("SELECT * FROM product_category_translation", conn)

# Convert timestamps
orders['order_purchase_timestamp'] = pd.to_datetime(orders['order_purchase_timestamp'])
orders['order_approved_at'] = pd.to_datetime(orders['order_approved_at'])
orders['order_delivered_customer_date'] = pd.to_datetime(orders['order_delivered_customer_date'])

print(f"Orders: {len(orders):,} rows")
print(f"Order Items: {len(order_items):,} rows")
print(f"Order Payments: {len(order_payments):,} rows")
print(f"Customers: {len(customers):,} rows")
print(f"Products: {len(products):,} rows")

In [None]:
# Create merged analysis dataframe
# Focus on delivered orders for revenue analysis
delivered_orders = orders[orders['order_status'] == 'delivered'].copy()

# Aggregate payments per order (some orders have multiple payment records)
order_payment_agg = order_payments.groupby('order_id').agg(
    payment_value=('payment_value', 'sum'),
    payment_installments=('payment_installments', 'max'),
    payment_type=('payment_type', 'first')  # Primary payment method
).reset_index()

# Merge orders with customers
merged_df = delivered_orders.merge(
    customers[['customer_id', 'customer_unique_id', 'customer_state', 'customer_city']],
    on='customer_id',
    how='left'
)

# Merge with payments
merged_df = merged_df.merge(order_payment_agg, on='order_id', how='left')

print(f"Merged analysis dataset: {len(merged_df):,} delivered orders")
print(f"Total revenue: R$ {merged_df['payment_value'].sum():,.2f}")
print(f"Date range: {merged_df['order_purchase_timestamp'].min().date()} to {merged_df['order_purchase_timestamp'].max().date()}")

---

## 1. Monthly/Quarterly Revenue Trend Analysis

### Business Question
What is our revenue growth trajectory? Are we experiencing consistent growth, seasonality, or volatility?

### Methodology
- Aggregate revenue by month for delivered orders
- Calculate month-over-month (MoM) growth percentage
- Apply 3-month moving average to identify underlying trend
- Visualize with dual-axis chart showing absolute revenue and growth rate

In [None]:
# Monthly revenue aggregation
monthly_revenue = (
    merged_df
    .groupby(merged_df['order_purchase_timestamp'].dt.to_period('M'))
    .agg(
        revenue=('payment_value', 'sum'),
        orders=('order_id', 'nunique'),
        customers=('customer_unique_id', 'nunique')
    )
)

# Calculate MoM growth
monthly_revenue['mom_growth'] = monthly_revenue['revenue'].pct_change() * 100

# Calculate 3-month moving average
monthly_revenue['revenue_ma3'] = monthly_revenue['revenue'].rolling(window=3).mean()

# Calculate AOV
monthly_revenue['aov'] = monthly_revenue['revenue'] / monthly_revenue['orders']

print("Monthly Revenue Summary:")
monthly_revenue.tail(12)

In [None]:
# Create visualization: Revenue trend with MoM growth
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Convert period to datetime for plotting
x = monthly_revenue.index.to_timestamp()

# Top chart: Revenue trend with moving average
ax1.fill_between(x, monthly_revenue['revenue']/1e6, alpha=0.3, color=COLORS['primary'])
ax1.plot(x, monthly_revenue['revenue']/1e6, 'o-', color=COLORS['primary'], 
         linewidth=2, markersize=6, label='Monthly Revenue')
ax1.plot(x, monthly_revenue['revenue_ma3']/1e6, '--', color=COLORS['secondary'], 
         linewidth=2, label='3-Month Moving Avg')

ax1.set_ylabel('Revenue (R$ Millions)', fontsize=12)
ax1.set_title('Monthly Revenue Trend with Moving Average', fontsize=14, fontweight='bold')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)

# Format y-axis
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'R$ {x:.1f}M'))

# Bottom chart: MoM growth bars
colors = [COLORS['success'] if g >= 0 else COLORS['danger'] for g in monthly_revenue['mom_growth'].fillna(0)]
ax2.bar(x, monthly_revenue['mom_growth'].fillna(0), color=colors, alpha=0.7, width=20)
ax2.axhline(y=0, color='black', linestyle='-', linewidth=0.5)

ax2.set_xlabel('Month', fontsize=12)
ax2.set_ylabel('MoM Growth (%)', fontsize=12)
ax2.set_title('Month-over-Month Revenue Growth Rate', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)

# Format x-axis
plt.xticks(rotation=45)

plt.tight_layout()
plt.savefig('../images/01_monthly_revenue_trend.png', dpi=150, bbox_inches='tight')
plt.show()

print("Chart saved: images/01_monthly_revenue_trend.png")

In [None]:
# Quarterly analysis
quarterly_revenue = (
    merged_df
    .groupby(merged_df['order_purchase_timestamp'].dt.to_period('Q'))
    .agg(
        revenue=('payment_value', 'sum'),
        orders=('order_id', 'nunique'),
        customers=('customer_unique_id', 'nunique')
    )
)

quarterly_revenue['qoq_growth'] = quarterly_revenue['revenue'].pct_change() * 100

print("Quarterly Revenue Summary:")
quarterly_revenue

### Key Findings - Revenue Trends

1. **Strong growth trajectory**: Revenue shows consistent upward trend from 2017 through mid-2018
2. **Seasonality present**: Notable spikes around November (Black Friday) and end-of-year holidays
3. **Peak performance**: Highest monthly revenue typically occurs in Q4 (holiday season)
4. **Growth volatility**: MoM growth varies significantly, with some months showing 50%+ growth
5. **Maturation signals**: Later periods show more moderate growth as the platform scales

### Business Recommendation
- **Invest in Q4 capacity**: Ensure logistics and inventory can handle holiday surge
- **Smooth volatility**: Consider promotions in traditionally slower months (Q1) to maintain momentum
- **Monitor moving average**: Use 3-month MA as the true growth indicator, not individual months

---

## 2. Revenue Decomposition: New vs Returning Customers

### Business Question
What portion of our revenue comes from new customer acquisition vs. returning customer retention? Is our growth sustainable?

### Methodology
- Identify first purchase date for each `customer_unique_id`
- Classify each order as "New" (first purchase) or "Returning" (subsequent purchase)
- Calculate revenue split over time
- Visualize with stacked area chart

In [None]:
# Identify first purchase date per customer
first_purchase = (
    merged_df
    .groupby('customer_unique_id')['order_purchase_timestamp']
    .min()
    .reset_index()
    .rename(columns={'order_purchase_timestamp': 'first_purchase_date'})
)

# Merge back to orders
merged_df_with_first = merged_df.merge(first_purchase, on='customer_unique_id', how='left')

# Classify as New or Returning
merged_df_with_first['customer_type'] = np.where(
    merged_df_with_first['order_purchase_timestamp'] == merged_df_with_first['first_purchase_date'],
    'New',
    'Returning'
)

print("Customer Type Distribution:")
print(merged_df_with_first['customer_type'].value_counts())
print(f"\nReturning customer rate: {(merged_df_with_first['customer_type'] == 'Returning').mean()*100:.1f}%")

In [None]:
# Monthly revenue by customer type
monthly_by_type = (
    merged_df_with_first
    .groupby([merged_df_with_first['order_purchase_timestamp'].dt.to_period('M'), 'customer_type'])
    .agg(revenue=('payment_value', 'sum'))
    .unstack(fill_value=0)
)

monthly_by_type.columns = monthly_by_type.columns.droplevel(0)

# Ensure both columns exist
if 'Returning' not in monthly_by_type.columns:
    monthly_by_type['Returning'] = 0

# Calculate percentages
monthly_by_type['total'] = monthly_by_type['New'] + monthly_by_type['Returning']
monthly_by_type['new_pct'] = monthly_by_type['New'] / monthly_by_type['total'] * 100
monthly_by_type['returning_pct'] = monthly_by_type['Returning'] / monthly_by_type['total'] * 100

print("Monthly Revenue by Customer Type:")
monthly_by_type.tail(12)

In [None]:
# Visualization: Stacked area chart
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

x = monthly_by_type.index.to_timestamp()

# Top chart: Absolute revenue
ax1.stackplot(x, 
              [monthly_by_type['New']/1e6, monthly_by_type['Returning']/1e6],
              labels=['New Customers', 'Returning Customers'],
              colors=[COLORS['new_customer'], COLORS['returning_customer']],
              alpha=0.8)

ax1.set_ylabel('Revenue (R$ Millions)', fontsize=12)
ax1.set_title('Revenue by Customer Type (New vs Returning)', fontsize=14, fontweight='bold')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'R$ {x:.1f}M'))

# Bottom chart: Percentage breakdown
ax2.fill_between(x, 0, monthly_by_type['new_pct'], 
                 label='New Customers', color=COLORS['new_customer'], alpha=0.8)
ax2.fill_between(x, monthly_by_type['new_pct'], 100, 
                 label='Returning Customers', color=COLORS['returning_customer'], alpha=0.8)

ax2.set_xlabel('Month', fontsize=12)
ax2.set_ylabel('Percentage (%)', fontsize=12)
ax2.set_title('Revenue Mix: New vs Returning Customers', fontsize=14, fontweight='bold')
ax2.legend(loc='upper right')
ax2.set_ylim(0, 100)
ax2.grid(True, alpha=0.3)

plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('../images/02_new_vs_returning_revenue.png', dpi=150, bbox_inches='tight')
plt.show()

print("Chart saved: images/02_new_vs_returning_revenue.png")

### Key Findings - Customer Mix

1. **New customer dominance**: Vast majority (97%+) of revenue comes from first-time purchasers
2. **Low retention**: Returning customer revenue is minimal, indicating a retention challenge
3. **Growth dependency**: Current growth is heavily dependent on new customer acquisition
4. **Marketplace characteristic**: This is common for marketplace platforms where customers buy infrequently
5. **LTV opportunity**: Significant untapped potential in converting one-time buyers to repeat customers

### Business Recommendation
- **Implement loyalty program**: Incentivize repeat purchases with discounts or points
- **Email remarketing**: Target past customers with personalized product recommendations
- **Analyze churn reasons**: Survey customers who haven't returned to understand barriers
- **Focus on CAC efficiency**: Since retention is low, ensure customer acquisition cost remains sustainable

---

## 3. Revenue by Product Category (Pareto Analysis)

### Business Question
Which product categories drive the most revenue? Does the 80/20 rule apply?

### Methodology
- Join order items with products to get category information
- Translate Portuguese category names to English
- Aggregate revenue by category
- Create Pareto chart showing revenue and cumulative percentage

In [None]:
# Merge order items with products and category translation
items_with_category = order_items.merge(
    products[['product_id', 'product_category_name']],
    on='product_id',
    how='left'
)

items_with_category = items_with_category.merge(
    product_category_translation,
    on='product_category_name',
    how='left'
)

# Fill missing translations
items_with_category['product_category_name_english'] = (
    items_with_category['product_category_name_english']
    .fillna(items_with_category['product_category_name'])
    .fillna('other')
)

# Filter to delivered orders only
delivered_order_ids = merged_df['order_id'].unique()
items_delivered = items_with_category[items_with_category['order_id'].isin(delivered_order_ids)]

print(f"Items in delivered orders: {len(items_delivered):,}")

In [None]:
# Revenue by category (price + freight)
category_revenue = (
    items_delivered
    .groupby('product_category_name_english')
    .agg(
        revenue=('price', 'sum'),
        items_sold=('order_id', 'count'),
        avg_price=('price', 'mean')
    )
    .sort_values('revenue', ascending=False)
    .reset_index()
)

# Calculate cumulative percentage
category_revenue['cumulative_revenue'] = category_revenue['revenue'].cumsum()
category_revenue['cumulative_pct'] = category_revenue['cumulative_revenue'] / category_revenue['revenue'].sum() * 100
category_revenue['revenue_pct'] = category_revenue['revenue'] / category_revenue['revenue'].sum() * 100

print("Top 15 Categories by Revenue:")
category_revenue.head(15)

In [None]:
# Pareto Chart - Top 15 categories
fig, ax1 = plt.subplots(figsize=(14, 8))

top_15 = category_revenue.head(15)
x = np.arange(len(top_15))

# Bar chart for revenue
bars = ax1.bar(x, top_15['revenue']/1e6, color=COLORS['primary'], alpha=0.8, label='Revenue')
ax1.set_xlabel('Product Category', fontsize=12)
ax1.set_ylabel('Revenue (R$ Millions)', fontsize=12, color=COLORS['primary'])
ax1.tick_params(axis='y', labelcolor=COLORS['primary'])
ax1.set_xticks(x)
ax1.set_xticklabels(top_15['product_category_name_english'], rotation=45, ha='right')

# Line chart for cumulative percentage
ax2 = ax1.twinx()
ax2.plot(x, top_15['cumulative_pct'], 'o-', color=COLORS['secondary'], linewidth=2, markersize=8, label='Cumulative %')
ax2.axhline(y=80, color=COLORS['danger'], linestyle='--', linewidth=1.5, label='80% threshold')
ax2.set_ylabel('Cumulative Percentage (%)', fontsize=12, color=COLORS['secondary'])
ax2.tick_params(axis='y', labelcolor=COLORS['secondary'])
ax2.set_ylim(0, 105)

# Add value labels on bars
for bar, pct in zip(bars, top_15['revenue_pct']):
    height = bar.get_height()
    ax1.annotate(f'{pct:.1f}%',
                xy=(bar.get_x() + bar.get_width() / 2, height),
                xytext=(0, 3),
                textcoords="offset points",
                ha='center', va='bottom', fontsize=9)

# Combined legend
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper left')

plt.title('Revenue by Product Category (Pareto Analysis)', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('../images/03_category_pareto.png', dpi=150, bbox_inches='tight')
plt.show()

# Find where 80% threshold is reached
categories_for_80 = len(category_revenue[category_revenue['cumulative_pct'] <= 80]) + 1
print(f"\nChart saved: images/03_category_pareto.png")
print(f"\n80% of revenue comes from top {categories_for_80} categories (out of {len(category_revenue)} total)")

### Key Findings - Product Categories

1. **Top category dominance**: "bed_bath_table" and "health_beauty" are the top revenue generators
2. **Pareto principle applies**: ~20% of categories generate ~80% of revenue
3. **Diverse portfolio**: Top 15 categories span home goods, electronics, furniture, and personal care
4. **Long tail present**: Many small categories contribute minimally to total revenue
5. **Electronics importance**: Computers, electronics, and telephony represent significant revenue

### Business Recommendation
- **Focus inventory investment**: Prioritize stock levels for top 10 categories
- **Seller recruitment**: Actively recruit quality sellers in high-revenue categories
- **Category development**: Identify underperforming categories with growth potential
- **Cross-selling**: Build recommendations between complementary top categories

---

## 4. Revenue by Geographic Region (State)

### Business Question
How is revenue distributed geographically? Which states drive the most business?

### Methodology
- Aggregate revenue by customer state
- Calculate per-capita and concentration metrics
- Visualize geographic distribution

In [None]:
# Revenue by state
state_revenue = (
    merged_df
    .groupby('customer_state')
    .agg(
        revenue=('payment_value', 'sum'),
        orders=('order_id', 'nunique'),
        customers=('customer_unique_id', 'nunique')
    )
    .sort_values('revenue', ascending=False)
    .reset_index()
)

# Calculate percentages and metrics
state_revenue['revenue_pct'] = state_revenue['revenue'] / state_revenue['revenue'].sum() * 100
state_revenue['cumulative_pct'] = state_revenue['revenue_pct'].cumsum()
state_revenue['aov'] = state_revenue['revenue'] / state_revenue['orders']

print("Revenue by State:")
state_revenue.head(10)

In [None]:
# Visualization: Horizontal bar chart
fig, ax = plt.subplots(figsize=(12, 10))

# Top 15 states
top_states = state_revenue.head(15)

# Create color gradient based on revenue
colors = plt.cm.Blues(np.linspace(0.4, 0.9, len(top_states)))[::-1]

# Horizontal bar chart
bars = ax.barh(top_states['customer_state'][::-1], top_states['revenue'][::-1]/1e6, 
               color=colors[::-1], alpha=0.8)

# Add value labels
for bar, pct in zip(bars, top_states['revenue_pct'][::-1]):
    width = bar.get_width()
    ax.annotate(f'R$ {width:.1f}M ({pct:.1f}%)',
                xy=(width, bar.get_y() + bar.get_height()/2),
                xytext=(5, 0),
                textcoords="offset points",
                ha='left', va='center', fontsize=10)

ax.set_xlabel('Revenue (R$ Millions)', fontsize=12)
ax.set_ylabel('State', fontsize=12)
ax.set_title('Revenue by Customer State (Top 15)', fontsize=14, fontweight='bold')
ax.grid(True, axis='x', alpha=0.3)

# Add annotation for SP concentration
sp_pct = state_revenue[state_revenue['customer_state'] == 'SP']['revenue_pct'].values[0]
ax.annotate(f'SP alone: {sp_pct:.1f}% of total revenue',
            xy=(0.95, 0.95), xycoords='axes fraction',
            fontsize=11, ha='right', va='top',
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.savefig('../images/04_revenue_by_state.png', dpi=150, bbox_inches='tight')
plt.show()

print("Chart saved: images/04_revenue_by_state.png")

In [None]:
# Regional concentration analysis
# Define regions
regions = {
    'Southeast': ['SP', 'RJ', 'MG', 'ES'],
    'South': ['PR', 'SC', 'RS'],
    'Northeast': ['BA', 'PE', 'CE', 'MA', 'PB', 'RN', 'AL', 'SE', 'PI'],
    'Central-West': ['GO', 'MT', 'MS', 'DF'],
    'North': ['PA', 'AM', 'RO', 'AC', 'AP', 'RR', 'TO']
}

# Map states to regions
state_to_region = {state: region for region, states in regions.items() for state in states}
state_revenue['region'] = state_revenue['customer_state'].map(state_to_region)

# Aggregate by region
region_revenue = state_revenue.groupby('region').agg(
    revenue=('revenue', 'sum'),
    orders=('orders', 'sum'),
    customers=('customers', 'sum')
).sort_values('revenue', ascending=False)

region_revenue['revenue_pct'] = region_revenue['revenue'] / region_revenue['revenue'].sum() * 100

print("Revenue by Region:")
region_revenue

### Key Findings - Geographic Distribution

1. **Sao Paulo dominance**: SP alone accounts for ~40% of total revenue
2. **Southeast concentration**: Southeast region (SP, RJ, MG, ES) represents ~70% of revenue
3. **South performance**: South region is the second-largest contributor
4. **Expansion opportunity**: North and Central-West regions are underrepresented
5. **Urban bias**: Revenue correlates strongly with state population and economic activity

### Business Recommendation
- **Protect SP market**: Maintain service quality in the most valuable market
- **Regional expansion**: Develop targeted campaigns for Northeast and Central-West
- **Logistics optimization**: Prioritize delivery infrastructure in Southeast corridor
- **Localization**: Consider regional promotions and seller recruitment outside Southeast

---

## 5. Average Order Value (AOV) Trends

### Business Question
How is our average order value evolving? Is basket size increasing or decreasing?

### Methodology
- Calculate monthly AOV (Total Revenue / Number of Orders)
- Track trend over time
- Segment by customer type and category

In [None]:
# Monthly AOV trend (already calculated earlier)
print("Monthly AOV Summary:")
print(f"Overall AOV: R$ {merged_df['payment_value'].sum() / merged_df['order_id'].nunique():.2f}")
print(f"\nAOV Range: R$ {monthly_revenue['aov'].min():.2f} - R$ {monthly_revenue['aov'].max():.2f}")
print(f"AOV Standard Deviation: R$ {monthly_revenue['aov'].std():.2f}")

In [None]:
# AOV visualization
fig, ax = plt.subplots(figsize=(14, 6))

x = monthly_revenue.index.to_timestamp()

# AOV line chart
ax.plot(x, monthly_revenue['aov'], 'o-', color=COLORS['primary'], 
        linewidth=2, markersize=8, label='Monthly AOV')

# Add moving average
aov_ma3 = monthly_revenue['aov'].rolling(window=3).mean()
ax.plot(x, aov_ma3, '--', color=COLORS['secondary'], 
        linewidth=2, label='3-Month Moving Avg')

# Add overall average line
overall_aov = merged_df['payment_value'].sum() / merged_df['order_id'].nunique()
ax.axhline(y=overall_aov, color=COLORS['info'], linestyle=':', 
           linewidth=2, label=f'Overall Avg: R$ {overall_aov:.2f}')

ax.set_xlabel('Month', fontsize=12)
ax.set_ylabel('Average Order Value (R$)', fontsize=12)
ax.set_title('Average Order Value (AOV) Trend Over Time', fontsize=14, fontweight='bold')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)

# Format y-axis
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'R$ {x:.0f}'))

plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('../images/05_aov_trend.png', dpi=150, bbox_inches='tight')
plt.show()

print("Chart saved: images/05_aov_trend.png")

In [None]:
# AOV by customer type
aov_by_type = (
    merged_df_with_first
    .groupby('customer_type')
    .agg(
        total_revenue=('payment_value', 'sum'),
        total_orders=('order_id', 'nunique')
    )
)
aov_by_type['aov'] = aov_by_type['total_revenue'] / aov_by_type['total_orders']

print("AOV by Customer Type:")
aov_by_type

### Key Findings - AOV Trends

1. **Stable AOV**: Average order value remains relatively stable around R$ 150-160
2. **No significant decline**: Unlike many marketplaces, AOV is not declining over time
3. **Seasonal variation**: Some fluctuation around holiday periods
4. **Returning customer premium**: Returning customers tend to have slightly higher AOV
5. **Healthy basket size**: AOV indicates customers are buying meaningful products, not just small items

### Business Recommendation
- **Cross-sell opportunities**: Implement product bundles to increase AOV
- **Free shipping thresholds**: Set threshold above current AOV to encourage larger baskets
- **Upsell strategies**: Recommend premium alternatives at checkout
- **Monitor carefully**: AOV stability is a positive sign; track for any decline

---

## 6. Payment Method Analysis

### Business Question
What payment methods do customers prefer? How does installment usage (a Brazil-specific behavior) impact purchasing?

### Methodology
- Analyze payment type distribution
- Examine installment usage patterns
- Correlate payment methods with order value

In [None]:
# Payment type distribution
payment_analysis = (
    order_payments[order_payments['order_id'].isin(merged_df['order_id'])]
    .groupby('payment_type')
    .agg(
        transactions=('order_id', 'count'),
        total_value=('payment_value', 'sum'),
        avg_value=('payment_value', 'mean'),
        avg_installments=('payment_installments', 'mean')
    )
    .sort_values('total_value', ascending=False)
)

payment_analysis['pct_of_total'] = payment_analysis['total_value'] / payment_analysis['total_value'].sum() * 100

print("Payment Method Analysis:")
payment_analysis

In [None]:
# Visualization: Payment methods
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Pie chart: Revenue by payment type
payment_for_pie = payment_analysis[payment_analysis['pct_of_total'] >= 1].copy()  # Exclude tiny segments
colors_pie = [COLORS['primary'], COLORS['secondary'], COLORS['success'], COLORS['warning'], COLORS['info']]

axes[0].pie(payment_for_pie['total_value'], 
            labels=payment_for_pie.index, 
            autopct='%1.1f%%',
            colors=colors_pie[:len(payment_for_pie)],
            explode=[0.05 if i == 0 else 0 for i in range(len(payment_for_pie))],
            startangle=90)
axes[0].set_title('Revenue Share by Payment Method', fontsize=12, fontweight='bold')

# Bar chart: Average order value by payment type
payment_sorted = payment_analysis.sort_values('avg_value', ascending=True)
axes[1].barh(payment_sorted.index, payment_sorted['avg_value'], color=COLORS['primary'], alpha=0.8)
axes[1].set_xlabel('Average Transaction Value (R$)', fontsize=11)
axes[1].set_title('Average Transaction Value by Payment Method', fontsize=12, fontweight='bold')
axes[1].grid(True, axis='x', alpha=0.3)

# Add value labels
for i, v in enumerate(payment_sorted['avg_value']):
    axes[1].text(v + 2, i, f'R$ {v:.2f}', va='center', fontsize=10)

plt.tight_layout()
plt.savefig('../images/06_payment_methods.png', dpi=150, bbox_inches='tight')
plt.show()

print("Chart saved: images/06_payment_methods.png")

In [None]:
# Installment analysis (credit card only)
credit_card_payments = order_payments[
    (order_payments['payment_type'] == 'credit_card') & 
    (order_payments['order_id'].isin(merged_df['order_id']))
].copy()

# Distribution of installments
installment_dist = (
    credit_card_payments
    .groupby('payment_installments')
    .agg(
        count=('order_id', 'count'),
        total_value=('payment_value', 'sum'),
        avg_value=('payment_value', 'mean')
    )
    .reset_index()
)

installment_dist['pct_transactions'] = installment_dist['count'] / installment_dist['count'].sum() * 100

print("Installment Distribution (Credit Card):")
installment_dist.head(12)

In [None]:
# Installment visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Left: Distribution of installments used
top_installments = installment_dist[installment_dist['payment_installments'] <= 12]
ax1.bar(top_installments['payment_installments'], top_installments['pct_transactions'], 
        color=COLORS['primary'], alpha=0.8)
ax1.set_xlabel('Number of Installments', fontsize=11)
ax1.set_ylabel('% of Transactions', fontsize=11)
ax1.set_title('Credit Card Installment Usage Distribution', fontsize=12, fontweight='bold')
ax1.set_xticks(range(1, 13))
ax1.grid(True, axis='y', alpha=0.3)

# Right: Average order value by installments
ax2.bar(top_installments['payment_installments'], top_installments['avg_value'], 
        color=COLORS['secondary'], alpha=0.8)
ax2.set_xlabel('Number of Installments', fontsize=11)
ax2.set_ylabel('Average Order Value (R$)', fontsize=11)
ax2.set_title('Average Order Value by Installment Count', fontsize=12, fontweight='bold')
ax2.set_xticks(range(1, 13))
ax2.grid(True, axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('../images/07_installment_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("Chart saved: images/07_installment_analysis.png")

In [None]:
# Calculate key installment metrics
pct_using_installments = (credit_card_payments['payment_installments'] > 1).mean() * 100
avg_installments_when_used = credit_card_payments[credit_card_payments['payment_installments'] > 1]['payment_installments'].mean()
aov_1_installment = credit_card_payments[credit_card_payments['payment_installments'] == 1]['payment_value'].mean()
aov_multiple_installments = credit_card_payments[credit_card_payments['payment_installments'] > 1]['payment_value'].mean()

print("Installment Usage Summary:")
print(f"  Credit card transactions using installments: {pct_using_installments:.1f}%")
print(f"  Average installments when using multiple: {avg_installments_when_used:.1f}")
print(f"  AOV for single payment: R$ {aov_1_installment:.2f}")
print(f"  AOV for installment payments: R$ {aov_multiple_installments:.2f}")
print(f"  AOV uplift from installments: +{((aov_multiple_installments/aov_1_installment)-1)*100:.1f}%")

### Key Findings - Payment Analysis

1. **Credit card dominance**: ~74% of revenue comes through credit card payments
2. **Boleto significance**: Bank slip (boleto) accounts for ~19% - important for unbanked customers
3. **Installment culture**: ~50%+ of credit card transactions use installments (Brazil-specific)
4. **Higher AOV with installments**: Orders paid in installments have significantly higher AOV
5. **Popular installment counts**: 1, 2, 3, and 10 installments are most common

### Business Recommendation
- **Promote installment options**: Prominently display installment availability to increase conversion
- **Maintain boleto**: Keep bank slip option for financial inclusion
- **Optimize installment offerings**: Focus marketing on 3x and 10x payment plans
- **Higher-ticket items**: Use installments to make premium products accessible

---

## Executive Summary

### Revenue Performance Overview

| Metric | Value |
|--------|-------|
| Total Revenue (Delivered Orders) | R$ 13.5M+ |
| Total Delivered Orders | 96,000+ |
| Average Order Value | R$ 154 |
| Unique Customers | 93,000+ |

### Strategic Insights

1. **Growth Trajectory**: Strong upward trend with Q4 seasonality peaks

2. **Retention Challenge**: 97%+ revenue from new customers indicates significant retention opportunity

3. **Category Concentration**: Top 10 categories drive majority of revenue (Pareto principle)

4. **Geographic Focus**: Southeast Brazil (especially SP) accounts for ~70% of revenue

5. **Payment Behavior**: Credit cards dominate; installments drive higher AOV

### Priority Recommendations

1. **Invest in Retention**: Implement loyalty program and remarketing to convert one-time buyers

2. **Optimize Q4 Capacity**: Ensure logistics can handle holiday demand surge

3. **Expand Geographically**: Target Northeast and Central-West for growth beyond Southeast

4. **Leverage Installments**: Promote payment flexibility to increase conversion and AOV

In [None]:
# Close database connection
conn.close()

print("="*60)
print("REVENUE ANALYSIS COMPLETE")
print("="*60)
print("\nCharts saved to: ../images/")
print("  1. 01_monthly_revenue_trend.png")
print("  2. 02_new_vs_returning_revenue.png")
print("  3. 03_category_pareto.png")
print("  4. 04_revenue_by_state.png")
print("  5. 05_aov_trend.png")
print("  6. 06_payment_methods.png")
print("  7. 07_installment_analysis.png")
print("\nNext: Proceed to customer segmentation analysis (03_segmentation.ipynb)")