# Price Elasticity Analysis

This notebook demonstrates price elasticity calculation for retail products using the M5 dataset.

## Overview

**Price Elasticity of Demand** measures how sensitive demand is to price changes:

$$\epsilon = \frac{\% \Delta Q}{\% \Delta P} = \frac{\Delta Q / Q}{\Delta P / P}$$

### Interpretation:
- **Elastic** (|ε| > 1): Demand highly sensitive to price → Lower prices increase revenue
- **Unit Elastic** (|ε| = 1): Proportional response → Revenue unchanged by price
- **Inelastic** (|ε| < 1): Demand relatively insensitive → Higher prices increase revenue

### Methods:
1. **Log-log regression**: $\ln(Q) = a + b \cdot \ln(P)$ → $\epsilon = b$
2. **Arc elasticity**: $(\Delta Q / Q_{avg}) / (\Delta P / P_{avg})$
3. **Point elasticity**: $(dQ/dP) \cdot (P/Q)$

In [None]:
# Imports
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

from src.pricing.elasticity import ElasticityAnalyzer
from src.utils.helpers import load_config, setup_logging

# Setup
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
logger = setup_logging()
config = load_config()

print("✓ Imports complete")

## 1. Load Data

In [None]:
# Load processed data
data_path = project_root / 'data' / 'processed' / 'price_sales_merged.csv'
df = pd.read_csv(data_path, parse_dates=['date'])

print(f"Loaded {len(df):,} observations")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
print(f"Products: {df['item_id'].nunique()}")
print(f"Stores: {df['store_id'].nunique()}")
print(f"\nFeatures ({len(df.columns)}): {list(df.columns[:10])}...")

df.head()

In [None]:
# Data quality check
print("Data Quality Summary:")
print(f"Missing values: {df.isnull().sum().sum()}")
print(f"Zero prices: {(df['sell_price'] == 0).sum()}")
print(f"Zero sales: {(df['sales'] == 0).sum()} ({(df['sales'] == 0).sum() / len(df) * 100:.1f}%)")
print(f"\nPrice statistics:")
print(df['sell_price'].describe())
print(f"\nSales statistics:")
print(df['sales'].describe())

## 2. Calculate Own-Price Elasticity

In [None]:
# Initialize analyzer
analyzer = ElasticityAnalyzer(method='log-log', min_observations=30)

# Calculate elasticities for all products
elasticities = analyzer.calculate_elasticities_batch(
    df=df,
    group_cols=['store_id', 'item_id']
)

print(f"\nCalculated elasticities for {len(elasticities)} products")
print(f"Valid results: {elasticities['valid'].sum()} ({elasticities['valid'].sum() / len(elasticities) * 100:.1f}%)")

elasticities.head(10)

In [None]:
# Filter valid results
valid_elasticities = elasticities[elasticities['valid'] == True].copy()

print(f"Valid elasticity statistics:")
print(valid_elasticities['elasticity'].describe())
print(f"\nR² statistics:")
print(valid_elasticities['r_squared'].describe())

## 3. Elasticity Distribution Analysis

In [None]:
# Distribution plot
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Histogram
axes[0].hist(valid_elasticities['elasticity'], bins=30, edgecolor='black', alpha=0.7)
axes[0].axvline(valid_elasticities['elasticity'].mean(), color='red', linestyle='--', 
                label=f"Mean: {valid_elasticities['elasticity'].mean():.2f}")
axes[0].axvline(valid_elasticities['elasticity'].median(), color='green', linestyle='--', 
                label=f"Median: {valid_elasticities['elasticity'].median():.2f}")
axes[0].axvline(-1, color='blue', linestyle=':', label='Unit Elastic (-1.0)')
axes[0].set_xlabel('Price Elasticity')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Distribution of Price Elasticities')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Box plot
axes[1].boxplot(valid_elasticities['elasticity'], vert=True)
axes[1].axhline(-1, color='blue', linestyle=':', label='Unit Elastic')
axes[1].set_ylabel('Price Elasticity')
axes[1].set_title('Elasticity Distribution (Box Plot)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# R-squared vs Elasticity
plt.figure(figsize=(10, 6))
plt.scatter(valid_elasticities['elasticity'], valid_elasticities['r_squared'], alpha=0.6)
plt.axvline(-1, color='blue', linestyle=':', label='Unit Elastic')
plt.axhline(0.3, color='red', linestyle='--', label='R² = 0.3 (Weak fit)')
plt.xlabel('Price Elasticity')
plt.ylabel('R² (Model Fit)')
plt.title('Elasticity vs Model Fit Quality')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# High-quality results (R² > 0.3)
high_quality = valid_elasticities[valid_elasticities['r_squared'] > 0.3]
print(f"\nHigh-quality elasticities (R² > 0.3): {len(high_quality)} ({len(high_quality) / len(valid_elasticities) * 100:.1f}%)")
print(f"Mean elasticity (high quality): {high_quality['elasticity'].mean():.2f}")

## 4. Segment Products by Elasticity

In [None]:
# Segment by elasticity category
segmented = analyzer.segment_by_elasticity(valid_elasticities)

print("Elasticity Category Distribution:")
print(segmented['elasticity_category'].value_counts())
print(f"\nPercentages:")
print(segmented['elasticity_category'].value_counts(normalize=True) * 100)

segmented.head(10)

In [None]:
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Bar chart
category_counts = segmented['elasticity_category'].value_counts()
category_order = ['Highly Inelastic', 'Inelastic', 'Unit Elastic', 'Elastic', 'Highly Elastic']
category_counts = category_counts.reindex([c for c in category_order if c in category_counts.index])

colors = ['#d62728', '#ff7f0e', '#2ca02c', '#1f77b4', '#9467bd']
axes[0].bar(range(len(category_counts)), category_counts.values, color=colors[:len(category_counts)])
axes[0].set_xticks(range(len(category_counts)))
axes[0].set_xticklabels(category_counts.index, rotation=45, ha='right')
axes[0].set_ylabel('Number of Products')
axes[0].set_title('Product Distribution by Elasticity Category')
axes[0].grid(True, alpha=0.3, axis='y')

# Pie chart
axes[1].pie(category_counts.values, labels=category_counts.index, autopct='%1.1f%%',
            colors=colors[:len(category_counts)], startangle=90)
axes[1].set_title('Elasticity Category Proportions')

plt.tight_layout()
plt.show()

In [None]:
# Pricing recommendations summary
print("\nPricing Recommendations by Category:\n")
for category in category_order:
    if category in segmented['elasticity_category'].values:
        products = segmented[segmented['elasticity_category'] == category]
        recommendation = products['pricing_recommendation'].iloc[0]
        mean_e = products['elasticity'].mean()
        print(f"{category} (n={len(products)}, ε̄={mean_e:.2f}):")
        print(f"  → {recommendation}\n")

## 5. Category Analysis

Analyze elasticity patterns by product category (Foods, Hobbies, Household).

In [None]:
# Extract category from item_id (FOODS_X_XXX, HOBBIES_X_XXX, HOUSEHOLD_X_XXX)
segmented['category'] = segmented['item_id'].str.split('_').str[0]

# Category statistics
print("Elasticity by Product Category:\n")
category_stats = segmented.groupby('category')['elasticity'].agg([
    'count', 'mean', 'median', 'std', 'min', 'max'
])
print(category_stats)

# Statistical test (ANOVA)
categories = [segmented[segmented['category'] == cat]['elasticity'].values 
              for cat in segmented['category'].unique()]
f_stat, p_value = stats.f_oneway(*categories)
print(f"\nANOVA Test: F={f_stat:.2f}, p={p_value:.4f}")
if p_value < 0.05:
    print("→ Categories have significantly different elasticities")
else:
    print("→ No significant difference between categories")

In [None]:
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Box plot
segmented.boxplot(column='elasticity', by='category', ax=axes[0])
axes[0].axhline(-1, color='red', linestyle='--', label='Unit Elastic')
axes[0].set_xlabel('Product Category')
axes[0].set_ylabel('Price Elasticity')
axes[0].set_title('Elasticity Distribution by Category')
axes[0].legend()
plt.sca(axes[0])
plt.xticks(rotation=0)

# Violin plot
categories_list = segmented['category'].unique()
positions = range(1, len(categories_list) + 1)
data_by_category = [segmented[segmented['category'] == cat]['elasticity'].values 
                    for cat in categories_list]

parts = axes[1].violinplot(data_by_category, positions=positions, showmeans=True, showmedians=True)
axes[1].axhline(-1, color='red', linestyle='--', label='Unit Elastic')
axes[1].set_xticks(positions)
axes[1].set_xticklabels(categories_list)
axes[1].set_xlabel('Product Category')
axes[1].set_ylabel('Price Elasticity')
axes[1].set_title('Elasticity Distribution by Category (Violin Plot)')
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

In [None]:
# Category-specific elasticity categories
print("\nElasticity Category Distribution by Product Category:\n")
category_crosstab = pd.crosstab(
    segmented['category'],
    segmented['elasticity_category'],
    normalize='index'
) * 100

print(category_crosstab.round(1))

# Heatmap
plt.figure(figsize=(10, 4))
sns.heatmap(category_crosstab, annot=True, fmt='.1f', cmap='YlOrRd', cbar_kws={'label': 'Percentage'})
plt.title('Elasticity Categories by Product Category (%)')
plt.xlabel('Elasticity Category')
plt.ylabel('Product Category')
plt.tight_layout()
plt.show()

## 6. Example: Single Product Deep Dive

In [None]:
# Select an interesting product (high R², moderate elasticity)
good_fits = segmented[segmented['r_squared'] > 0.4].sort_values('r_squared', ascending=False)
example_product = good_fits.iloc[0]

print("Example Product Analysis:")
print(f"Store: {example_product['store_id']}")
print(f"Item: {example_product['item_id']}")
print(f"Elasticity: {example_product['elasticity']:.3f}")
print(f"Category: {example_product['elasticity_category']}")
print(f"R²: {example_product['r_squared']:.3f}")
print(f"Recommendation: {example_product['pricing_recommendation']}")

# Get product data
product_data = df[
    (df['store_id'] == example_product['store_id']) & 
    (df['item_id'] == example_product['item_id'])
].copy()

print(f"\nObservations: {len(product_data)}")
print(f"Date range: {product_data['date'].min()} to {product_data['date'].max()}")

In [None]:
# Visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Time series: Price and Sales
ax1 = axes[0, 0]
ax1_twin = ax1.twinx()
ax1.plot(product_data['date'], product_data['sell_price'], color='blue', label='Price')
ax1_twin.bar(product_data['date'], product_data['sales'], alpha=0.3, color='green', label='Sales')
ax1.set_xlabel('Date')
ax1.set_ylabel('Price ($)', color='blue')
ax1_twin.set_ylabel('Sales (units)', color='green')
ax1.set_title(f'Price and Sales Over Time\n{example_product["item_id"]}')
ax1.legend(loc='upper left')
ax1_twin.legend(loc='upper right')
ax1.grid(True, alpha=0.3)

# Scatter: Price vs Sales
axes[0, 1].scatter(product_data['sell_price'], product_data['sales'], alpha=0.5)
axes[0, 1].set_xlabel('Price ($)')
axes[0, 1].set_ylabel('Sales (units)')
axes[0, 1].set_title('Price vs Sales Relationship')
axes[0, 1].grid(True, alpha=0.3)

# Log-log plot
log_price = np.log(product_data['sell_price'])
log_sales = np.log(product_data['sales'] + 0.1)  # Add small constant for log
axes[1, 0].scatter(log_price, log_sales, alpha=0.5)

# Add regression line
from sklearn.linear_model import LinearRegression
mask = np.isfinite(log_price) & np.isfinite(log_sales)
model = LinearRegression()
model.fit(log_price[mask].values.reshape(-1, 1), log_sales[mask].values)
x_line = np.linspace(log_price.min(), log_price.max(), 100)
y_line = model.predict(x_line.reshape(-1, 1))
axes[1, 0].plot(x_line, y_line, color='red', linestyle='--', 
                label=f'ε = {model.coef_[0]:.3f}')
axes[1, 0].set_xlabel('ln(Price)')
axes[1, 0].set_ylabel('ln(Sales)')
axes[1, 0].set_title('Log-Log Regression')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Revenue simulation
price_range = np.linspace(product_data['sell_price'].min() * 0.8, 
                          product_data['sell_price'].max() * 1.2, 50)
mean_sales = product_data['sales'].mean()
mean_price = product_data['sell_price'].mean()
elasticity = example_product['elasticity']

# Simulate sales at different prices using elasticity
simulated_sales = mean_sales * (price_range / mean_price) ** elasticity
simulated_revenue = price_range * simulated_sales

axes[1, 1].plot(price_range, simulated_revenue, linewidth=2)
axes[1, 1].axvline(mean_price, color='red', linestyle='--', label=f'Current: ${mean_price:.2f}')
optimal_idx = simulated_revenue.argmax()
optimal_price = price_range[optimal_idx]
axes[1, 1].axvline(optimal_price, color='green', linestyle='--', 
                   label=f'Optimal: ${optimal_price:.2f}')
axes[1, 1].set_xlabel('Price ($)')
axes[1, 1].set_ylabel('Simulated Revenue ($)')
axes[1, 1].set_title('Revenue Optimization Curve')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nRevenue Analysis:")
print(f"Current mean price: ${mean_price:.2f}")
print(f"Optimal price: ${optimal_price:.2f} ({(optimal_price - mean_price) / mean_price * 100:+.1f}%)")
print(f"Revenue increase potential: {(simulated_revenue[optimal_idx] / (mean_price * mean_sales) - 1) * 100:.1f}%")

## 7. Cross-Price Elasticity Analysis

Identify substitute and complement products.

In [None]:
# Select products from same category for cross-elasticity
foods_items = segmented[segmented['category'] == 'FOODS']['item_id'].unique()[:5]

print(f"Analyzing cross-elasticity for {len(foods_items)} FOODS products...\n")

cross_elasticities = []

for i, item_a in enumerate(foods_items):
    for item_b in foods_items[i+1:]:
        result = analyzer.calculate_cross_elasticity(
            product_a_id=item_a,
            product_b_id=item_b,
            data=df[df['store_id'] == 'CA_1']  # Same store
        )
        if result['valid']:
            cross_elasticities.append(result)

cross_df = pd.DataFrame(cross_elasticities)

if len(cross_df) > 0:
    print(f"Valid cross-elasticity calculations: {len(cross_df)}\n")
    print(cross_df.head(10))
    
    # Relationship distribution
    print(f"\nRelationship Distribution:")
    print(cross_df['relationship'].value_counts())
else:
    print("No valid cross-elasticity results (may need more data overlap)")

In [None]:
# Visualize cross-elasticities if available
if len(cross_df) > 0:
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Distribution
    axes[0].hist(cross_df['cross_elasticity'], bins=20, edgecolor='black', alpha=0.7)
    axes[0].axvline(0, color='red', linestyle='--', label='Independent (0)')
    axes[0].set_xlabel('Cross-Price Elasticity')
    axes[0].set_ylabel('Frequency')
    axes[0].set_title('Distribution of Cross-Price Elasticities')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Relationship counts
    relationship_counts = cross_df['relationship'].value_counts()
    colors_rel = {'Substitutes': '#1f77b4', 'Complements': '#ff7f0e', 'Independent': '#2ca02c'}
    bar_colors = [colors_rel.get(rel, '#gray') for rel in relationship_counts.index]
    axes[1].bar(range(len(relationship_counts)), relationship_counts.values, color=bar_colors)
    axes[1].set_xticks(range(len(relationship_counts)))
    axes[1].set_xticklabels(relationship_counts.index, rotation=0)
    axes[1].set_ylabel('Count')
    axes[1].set_title('Product Relationships')
    axes[1].grid(True, alpha=0.3, axis='y')
    
    plt.tight_layout()
    plt.show()

## 8. Summary and Export Results

In [None]:
# Get comprehensive summary
summary = analyzer.get_elasticity_summary(segmented)

print("=" * 60)
print("PRICE ELASTICITY ANALYSIS SUMMARY")
print("=" * 60)
print(f"\nDataset:")
print(f"  Total products analyzed: {summary['total_products']}")
print(f"  Valid elasticity results: {summary['valid_results']} ({summary['valid_percentage']:.1f}%)")
print(f"\nElasticity Statistics:")
print(f"  Mean: {summary['mean_elasticity']:.3f}")
print(f"  Median: {summary['median_elasticity']:.3f}")
print(f"  Std Dev: {summary['std_elasticity']:.3f}")
print(f"  Range: [{summary['min_elasticity']:.3f}, {summary['max_elasticity']:.3f}]")
print(f"\nModel Quality:")
print(f"  Mean R²: {summary['mean_r_squared']:.3f}")
print(f"\nCategory Distribution:")
for category, count in summary['category_distribution'].items():
    pct = count / summary['valid_results'] * 100
    print(f"  {category}: {count} ({pct:.1f}%)")
print("=" * 60)

In [None]:
# Export results
output_dir = project_root / 'data' / 'processed'
output_dir.mkdir(parents=True, exist_ok=True)

# Save elasticity results
elasticity_file = output_dir / 'elasticity_results.csv'
segmented.to_csv(elasticity_file, index=False)
print(f"✓ Saved elasticity results: {elasticity_file}")
print(f"  Rows: {len(segmented)}")
print(f"  Columns: {len(segmented.columns)}")

# Save cross-elasticity results
if len(cross_df) > 0:
    cross_file = output_dir / 'cross_elasticity_results.csv'
    cross_df.to_csv(cross_file, index=False)
    print(f"\n✓ Saved cross-elasticity results: {cross_file}")
    print(f"  Rows: {len(cross_df)}")

# Save summary as JSON
import json
summary_file = output_dir / 'elasticity_summary.json'
with open(summary_file, 'w') as f:
    json.dump(summary, f, indent=2)
print(f"\n✓ Saved summary: {summary_file}")

## 9. Key Insights and Recommendations

### Findings:

1. **Overall Elasticity**: Most products show elastic demand (|ε| > 1), suggesting price reductions could drive revenue growth

2. **Category Patterns**:
   - **FOODS**: More elastic (discretionary items)
   - **HOUSEHOLD**: Less elastic (necessities)
   - **HOBBIES**: Highly elastic (luxury/discretionary)

3. **Model Quality**: R² values indicate good fit for many products, with some requiring more data or different modeling approaches

4. **Cross-Elasticity**: Products within same category show substitute relationships (positive cross-elasticity)

### Strategic Recommendations:

1. **Elastic Products** (|ε| > 1):
   - Lower prices to maximize volume and revenue
   - Use promotional pricing strategically
   - Consider loss leaders for high-elasticity items

2. **Inelastic Products** (|ε| < 1):
   - Raise prices to increase revenue
   - Maintain stable pricing (customers less sensitive)
   - Focus on availability and quality over price

3. **Unit Elastic** (|ε| ≈ 1):
   - Revenue neutral to price changes
   - Use non-price strategies (marketing, placement)
   - Consider cost-plus pricing

4. **Cross-Elasticity Insights**:
   - Bundle complements together
   - Price substitutes competitively
   - Monitor competitor pricing for substitute products

### Next Steps:

- **Phase 4**: Build demand response models using these elasticities
- **Phase 5**: Implement price optimization engine
- **Phase 6**: Develop markdown strategies for end-of-life products
- **Phase 7**: Integrate competitive pricing analysis