# ML Dynamic Pricing - Phase 2: Elasticity-Based Optimization

Combines rule-based pricing with price elasticity modeling for revenue-optimal recommendations.

## Phase 1 Features (Rule-Based)
- **Age-based markdowns**: Products markdown based on days since receipt
- **Inventory triggers**: Overstocked items get price reductions
- **Seasonal clearance**: End-of-season products marked down

## Phase 2 Features (NEW)
- **Elasticity integration**: Uses price elasticity coefficients from gold_price_elasticity
- **Revenue optimization**: Calculates optimal price point to maximize revenue
- **Inventory urgency**: Adjusts recommendations based on stock levels and sell-through rate
- **Revenue impact**: Estimates expected revenue change per recommendation
- **Confidence levels**: Provides confidence scores based on data quality and elasticity estimates

## Constraints
- Min Margin: 15% (price must be >= cost * 1.15)
- Max Price: MSRP (cannot exceed manufacturer suggested price)
- Max Change/Week: 10% (limit price volatility)
- Min Duration: 7 days (avoid frequent changes)

## Data Flow
```
Silver (dim_products, fact_store_inventory_txn, fact_receipt_lines)
  + Gold (gold_price_elasticity)
  --> Gold (gold_pricing_recommendations)
```

## Usage
Schedule this notebook to run **daily** via Fabric pipeline.

## Output Schema
The `gold_pricing_recommendations` table includes:
- Current price and recommended price
- Reason codes for transparency
- Constraint validation flags
- Price elasticity coefficient and category
- Expected revenue impact
- Confidence level (high/medium/low)

In [None]:
from pyspark.sql import functions as F
from pyspark.sql.window import Window
from pyspark.sql.utils import AnalysisException
from datetime import datetime, timezone, timedelta
import os

In [None]:
# =============================================================================
# PARAMETERS
# =============================================================================

def get_env(var_name, default=None):
    return os.environ.get(var_name, default)

SILVER_DB = get_env("SILVER_DB", default="ag")
GOLD_DB = get_env("GOLD_DB", default="au")

# Pricing constraints (business rules)
MIN_MARGIN_PCT = 0.15  # 15% minimum margin
MAX_PRICE_FACTOR = 1.0  # Max price = MSRP
MAX_CHANGE_PCT_WEEKLY = 0.10  # Max 10% change per week
MIN_DURATION_DAYS = 7  # Min 7 days between price changes

# Markdown rules (Phase 1)
AGE_THRESHOLD_MODERATE = 30  # Days - start moderate markdown
AGE_THRESHOLD_AGGRESSIVE = 60  # Days - aggressive markdown
AGE_MARKDOWN_MODERATE = 0.10  # 10% markdown
AGE_MARKDOWN_AGGRESSIVE = 0.25  # 25% markdown

INVENTORY_HIGH_THRESHOLD = 100  # Units - considered overstocked
INVENTORY_MARKDOWN = 0.15  # 15% markdown for overstocked

# Seasonal clearance (example: end of Q4 for winter items)
CLEARANCE_MARKDOWN = 0.30  # 30% markdown for clearance

# Phase 2: Elasticity-based optimization parameters
ELASTICITY_WEIGHT = 0.6  # Weight for elasticity-based price vs rule-based
INVENTORY_URGENCY_DAYS = 14  # Days of inventory to trigger urgency
MIN_OBSERVATIONS_FOR_ELASTICITY = 20  # Min data points to use elasticity

print(f"Configuration: SILVER_DB={SILVER_DB}, GOLD_DB={GOLD_DB}")
print(f"Constraints: Min Margin={MIN_MARGIN_PCT*100}%, Max Price=MSRP, Max Change/Week={MAX_CHANGE_PCT_WEEKLY*100}%")
print(f"Phase 2: ELASTICITY_WEIGHT={ELASTICITY_WEIGHT}, INVENTORY_URGENCY_DAYS={INVENTORY_URGENCY_DAYS}")

In [None]:
# =============================================================================
# HELPER FUNCTIONS
# =============================================================================

def ensure_database(name):
    spark.sql(f"CREATE DATABASE IF NOT EXISTS {name}")

def read_silver(table_name):
    return spark.table(f"{SILVER_DB}.{table_name}")

def read_gold(table_name):
    try:
        return spark.table(f"{GOLD_DB}.{table_name}")
    except AnalysisException:
        return None

def save_gold(df, table_name):
    full_name = f"{GOLD_DB}.{table_name}"
    df.write.format("delta").mode("overwrite").saveAsTable(full_name)
    print(f"  {full_name}: {df.count()} rows")

def silver_exists(table_name):
    try:
        spark.table(f"{SILVER_DB}.{table_name}")
        return True
    except AnalysisException:
        return False

ensure_database(GOLD_DB)

In [None]:
# =============================================================================
# PRICING CONSTRAINTS TABLE
# =============================================================================

print("="*60)
print("CREATING PRICING CONSTRAINTS TABLE")
print("="*60)

# Create constraints reference table for transparency and configurability
constraints_data = [
    {"constraint_name": "min_margin_pct", "constraint_value": MIN_MARGIN_PCT, "description": "Minimum margin percentage (price >= cost * 1.15)"},
    {"constraint_name": "max_price_factor", "constraint_value": MAX_PRICE_FACTOR, "description": "Maximum price as factor of MSRP (1.0 = cannot exceed MSRP)"},
    {"constraint_name": "max_change_pct_weekly", "constraint_value": MAX_CHANGE_PCT_WEEKLY, "description": "Maximum price change per week (as percentage)"},
    {"constraint_name": "min_duration_days", "constraint_value": float(MIN_DURATION_DAYS), "description": "Minimum days between price changes"},
    {"constraint_name": "age_threshold_moderate", "constraint_value": float(AGE_THRESHOLD_MODERATE), "description": "Days to trigger moderate markdown"},
    {"constraint_name": "age_threshold_aggressive", "constraint_value": float(AGE_THRESHOLD_AGGRESSIVE), "description": "Days to trigger aggressive markdown"},
    {"constraint_name": "age_markdown_moderate", "constraint_value": AGE_MARKDOWN_MODERATE, "description": "Markdown percentage for moderate age threshold"},
    {"constraint_name": "age_markdown_aggressive", "constraint_value": AGE_MARKDOWN_AGGRESSIVE, "description": "Markdown percentage for aggressive age threshold"},
    {"constraint_name": "inventory_high_threshold", "constraint_value": float(INVENTORY_HIGH_THRESHOLD), "description": "Units threshold for overstocked items"},
    {"constraint_name": "inventory_markdown", "constraint_value": INVENTORY_MARKDOWN, "description": "Markdown percentage for overstocked items"},
    {"constraint_name": "clearance_markdown", "constraint_value": CLEARANCE_MARKDOWN, "description": "Markdown percentage for seasonal clearance"},
    {"constraint_name": "elasticity_weight", "constraint_value": ELASTICITY_WEIGHT, "description": "Weight for elasticity-based pricing (Phase 2)"},
    {"constraint_name": "inventory_urgency_days", "constraint_value": float(INVENTORY_URGENCY_DAYS), "description": "Days of inventory to trigger urgency (Phase 2)"},
]

df_constraints = spark.createDataFrame(constraints_data)
save_gold(df_constraints, "pricing_constraints")
print()

In [None]:
# =============================================================================
# LOAD SOURCE DATA
# =============================================================================

print("="*60)
print("LOADING SOURCE DATA")
print("="*60)

if not silver_exists("dim_products"):
    raise Exception("dim_products table not found. Run 02-historical-data-load.ipynb first.")

# Load product master with pricing data
df_products = read_silver("dim_products").select(
    F.col("ID").alias("product_id"),
    "ProductName",
    "Department",
    "Category",
    "Subcategory",
    "Cost",
    "MSRP",
    "SalePrice",
    "Tags"
)

print(f"Products loaded: {df_products.count()}")

# Load current inventory positions (aggregated across all stores)
df_inventory = None
if silver_exists("fact_store_inventory_txn"):
    # Calculate current inventory by product across all stores
    window_spec = Window.partitionBy("product_id").orderBy(F.desc("event_ts"))
    df_inventory = (
        read_silver("fact_store_inventory_txn")
        .withColumn("rn", F.row_number().over(window_spec))
        .filter(F.col("rn") == 1)
        .groupBy("product_id")
        .agg(
            F.sum("balance").alias("total_inventory"),
            F.max("event_ts").alias("inventory_as_of")
        )
    )
    print(f"Inventory positions loaded: {df_inventory.count()}")
else:
    print("No inventory data available - using product data only")

# Load previous pricing recommendations to check duration constraint
df_previous_prices = read_gold("gold_pricing_recommendations")
if df_previous_prices:
    print(f"Previous pricing recommendations loaded: {df_previous_prices.count()}")
else:
    print("No previous pricing recommendations - first run")

# Phase 2: Load price elasticity data
df_elasticity = read_gold("gold_price_elasticity")
if df_elasticity:
    print(f"Price elasticity data loaded: {df_elasticity.count()} products")
else:
    print("WARNING: No price elasticity data found - Phase 2 optimization will be limited")
    print("Run 11-ml-promotion-effectiveness.ipynb to generate elasticity data")

print()

In [None]:
# =============================================================================
# CALCULATE PRODUCT AGE AND SALES VELOCITY
# =============================================================================

print("="*60)
print("CALCULATING PRODUCT AGE AND SALES VELOCITY")
print("="*60)

df_product_age = None
df_sales_velocity = None

if silver_exists("fact_receipt_lines"):
    # Calculate days since first/last receipt and recent sales velocity
    df_receipt_lines = read_silver("fact_receipt_lines")
    
    # Product age
    df_product_age = (
        df_receipt_lines
        .groupBy("product_id")
        .agg(
            F.min("event_ts").alias("first_receipt_ts"),
            F.max("event_ts").alias("last_receipt_ts")
        )
        .withColumn("current_ts", F.current_timestamp())
        .withColumn(
            "days_since_first_receipt",
            F.datediff(F.col("current_ts"), F.col("first_receipt_ts"))
        )
        .withColumn(
            "days_since_last_receipt",
            F.datediff(F.col("current_ts"), F.col("last_receipt_ts"))
        )
        .select("product_id", "days_since_first_receipt", "days_since_last_receipt")
    )
    print(f"Product age calculated for {df_product_age.count()} products")
    
    # Sales velocity (last 30 days)
    thirty_days_ago = F.date_sub(F.current_date(), 30)
    df_sales_velocity = (
        df_receipt_lines
        .filter(F.col("event_ts") >= thirty_days_ago)
        .groupBy("product_id")
        .agg(
            F.sum("quantity").alias("units_sold_30d"),
            F.countDistinct(F.to_date("event_ts")).alias("days_with_sales")
        )
        .withColumn(
            "avg_daily_sales",
            F.col("units_sold_30d") / F.greatest(F.col("days_with_sales"), F.lit(1))
        )
        .select("product_id", "units_sold_30d", "avg_daily_sales")
    )
    print(f"Sales velocity calculated for {df_sales_velocity.count()} products")
else:
    print("No receipt data - product age and sales velocity will be 0")

print()

In [None]:
# =============================================================================
# PHASE 1: RULE-BASED MARKDOWN LOGIC
# =============================================================================

print("="*60)
print("PHASE 1: APPLYING RULE-BASED MARKDOWN LOGIC")
print("="*60)

# Start with product master
df_pricing = df_products

# Join with inventory if available
if df_inventory is not None:
    df_pricing = df_pricing.join(df_inventory, "product_id", "left")
else:
    df_pricing = df_pricing.withColumn("total_inventory", F.lit(None).cast("long"))
    df_pricing = df_pricing.withColumn("inventory_as_of", F.lit(None).cast("timestamp"))

# Join with product age if available
if df_product_age is not None:
    df_pricing = df_pricing.join(df_product_age, "product_id", "left")
else:
    df_pricing = df_pricing.withColumn("days_since_first_receipt", F.lit(0))
    df_pricing = df_pricing.withColumn("days_since_last_receipt", F.lit(0))

# Join with sales velocity if available
if df_sales_velocity is not None:
    df_pricing = df_pricing.join(df_sales_velocity, "product_id", "left")
else:
    df_pricing = df_pricing.withColumn("units_sold_30d", F.lit(0))
    df_pricing = df_pricing.withColumn("avg_daily_sales", F.lit(0.0))

# Initialize markdown factors and reason codes
df_pricing = df_pricing.withColumn("markdown_factor", F.lit(0.0))
df_pricing = df_pricing.withColumn("reason_codes", F.array())

# Rule 1: Age-based markdowns
df_pricing = df_pricing.withColumn(
    "age_markdown",
    F.when(
        F.col("days_since_last_receipt") >= AGE_THRESHOLD_AGGRESSIVE,
        F.lit(AGE_MARKDOWN_AGGRESSIVE)
    ).when(
        F.col("days_since_last_receipt") >= AGE_THRESHOLD_MODERATE,
        F.lit(AGE_MARKDOWN_MODERATE)
    ).otherwise(F.lit(0.0))
)

df_pricing = df_pricing.withColumn(
    "markdown_factor",
    F.greatest(F.col("markdown_factor"), F.col("age_markdown"))
)

df_pricing = df_pricing.withColumn(
    "reason_codes",
    F.when(
        F.col("age_markdown") == AGE_MARKDOWN_AGGRESSIVE,
        F.array_union(F.col("reason_codes"), F.array(F.lit("AGE_AGGRESSIVE")))
    ).when(
        F.col("age_markdown") == AGE_MARKDOWN_MODERATE,
        F.array_union(F.col("reason_codes"), F.array(F.lit("AGE_MODERATE")))
    ).otherwise(F.col("reason_codes"))
)

# Rule 2: Inventory-based markdowns
df_pricing = df_pricing.withColumn(
    "inventory_markdown",
    F.when(
        F.col("total_inventory") >= INVENTORY_HIGH_THRESHOLD,
        F.lit(INVENTORY_MARKDOWN)
    ).otherwise(F.lit(0.0))
)

df_pricing = df_pricing.withColumn(
    "markdown_factor",
    F.greatest(F.col("markdown_factor"), F.col("inventory_markdown"))
)

df_pricing = df_pricing.withColumn(
    "reason_codes",
    F.when(
        F.col("inventory_markdown") > 0,
        F.array_union(F.col("reason_codes"), F.array(F.lit("INVENTORY_HIGH")))
    ).otherwise(F.col("reason_codes"))
)

# Rule 3: Seasonal clearance (example: products tagged with winter/holiday in Q1)
current_month = datetime.now(timezone.utc).month
is_clearance_period = current_month in [1, 2, 3]  # Q1 for winter clearance

if is_clearance_period:
    df_pricing = df_pricing.withColumn(
        "seasonal_markdown",
        F.when(
            (F.col("Tags").isNotNull()) & 
            ((F.lower(F.col("Tags")).contains("winter")) | 
             (F.lower(F.col("Tags")).contains("holiday"))),
            F.lit(CLEARANCE_MARKDOWN)
        ).otherwise(F.lit(0.0))
    )
    
    df_pricing = df_pricing.withColumn(
        "markdown_factor",
        F.greatest(F.col("markdown_factor"), F.col("seasonal_markdown"))
    )
    
    df_pricing = df_pricing.withColumn(
        "reason_codes",
        F.when(
            F.col("seasonal_markdown") > 0,
            F.array_union(F.col("reason_codes"), F.array(F.lit("SEASONAL_CLEARANCE")))
        ).otherwise(F.col("reason_codes"))
    )

# Calculate rule-based recommended price
df_pricing = df_pricing.withColumn(
    "rule_based_price",
    F.col("SalePrice") * (1 - F.col("markdown_factor"))
)

print(f"Rule-based markdown logic applied to {df_pricing.count()} products")
print()

In [None]:
# =============================================================================
# PHASE 2: ELASTICITY-BASED OPTIMIZATION
# =============================================================================

print("="*60)
print("PHASE 2: APPLYING ELASTICITY-BASED OPTIMIZATION")
print("="*60)

if df_elasticity is not None:
    # Join elasticity data
    df_pricing = df_pricing.join(
        df_elasticity.select(
            "product_id",
            "elasticity_coefficient",
            "elasticity_category",
            "n_observations",
            F.col("confidence_interval_lower").alias("elasticity_ci_lower"),
            F.col("confidence_interval_upper").alias("elasticity_ci_upper")
        ),
        "product_id",
        "left"
    )
    
    # Calculate optimal price using elasticity
    # Formula: optimal_price = cost * (elasticity / (elasticity + 1))
    # This maximizes revenue = price * quantity, where quantity = f(price, elasticity)
    df_pricing = df_pricing.withColumn(
        "elasticity_optimal_price",
        F.when(
            (F.col("elasticity_coefficient").isNotNull()) & 
            (F.col("elasticity_coefficient") < -0.1) &  # Must be negative and meaningful
            (F.col("n_observations") >= MIN_OBSERVATIONS_FOR_ELASTICITY),
            F.col("Cost") * (F.abs(F.col("elasticity_coefficient")) / (F.abs(F.col("elasticity_coefficient")) + 1))
        ).otherwise(F.lit(None))
    )
    
    # Blend rule-based and elasticity-based prices
    df_pricing = df_pricing.withColumn(
        "blended_price",
        F.when(
            F.col("elasticity_optimal_price").isNotNull(),
            (F.col("elasticity_optimal_price") * ELASTICITY_WEIGHT) + 
            (F.col("rule_based_price") * (1 - ELASTICITY_WEIGHT))
        ).otherwise(F.col("rule_based_price"))
    )
    
    # Add elasticity reason code
    df_pricing = df_pricing.withColumn(
        "reason_codes",
        F.when(
            F.col("elasticity_optimal_price").isNotNull(),
            F.array_union(F.col("reason_codes"), F.array(F.lit("ELASTICITY_OPTIMIZED")))
        ).otherwise(F.col("reason_codes"))
    )
    
    print(f"Elasticity optimization applied to products with sufficient data")
else:
    # No elasticity data - use rule-based price only
    df_pricing = df_pricing.withColumn("elasticity_coefficient", F.lit(None).cast("decimal(10,4)"))
    df_pricing = df_pricing.withColumn("elasticity_category", F.lit(None).cast("string"))
    df_pricing = df_pricing.withColumn("n_observations", F.lit(None).cast("int"))
    df_pricing = df_pricing.withColumn("elasticity_ci_lower", F.lit(None).cast("decimal(10,4)"))
    df_pricing = df_pricing.withColumn("elasticity_ci_upper", F.lit(None).cast("decimal(10,4)"))
    df_pricing = df_pricing.withColumn("elasticity_optimal_price", F.lit(None).cast("decimal(10,2)"))
    df_pricing = df_pricing.withColumn("blended_price", F.col("rule_based_price"))
    print("No elasticity data available - using rule-based prices only")

print()

In [None]:
# =============================================================================
# PHASE 2: INVENTORY URGENCY ADJUSTMENT
# =============================================================================

print("="*60)
print("PHASE 2: APPLYING INVENTORY URGENCY ADJUSTMENT")
print("="*60)

# Calculate days of inventory on hand
df_pricing = df_pricing.withColumn(
    "days_of_inventory",
    F.when(
        (F.col("avg_daily_sales") > 0.1),  # Avoid division by very small numbers
        F.col("total_inventory") / F.col("avg_daily_sales")
    ).otherwise(F.lit(999))  # High value = no urgency
)

# Apply urgency discount for low inventory days
df_pricing = df_pricing.withColumn(
    "urgency_markdown",
    F.when(
        F.col("days_of_inventory") <= INVENTORY_URGENCY_DAYS,
        0.05 + (0.10 * (1 - F.col("days_of_inventory") / INVENTORY_URGENCY_DAYS))
    ).otherwise(F.lit(0.0))
)

# Apply urgency adjustment to blended price
df_pricing = df_pricing.withColumn(
    "recommended_price_raw",
    F.col("blended_price") * (1 - F.col("urgency_markdown"))
)

# Add urgency reason code
df_pricing = df_pricing.withColumn(
    "reason_codes",
    F.when(
        F.col("urgency_markdown") > 0,
        F.array_union(F.col("reason_codes"), F.array(F.lit("INVENTORY_URGENCY")))
    ).otherwise(F.col("reason_codes"))
)

print(f"Inventory urgency adjustment applied")
print()

In [None]:
# =============================================================================
# APPLY BUSINESS CONSTRAINTS
# =============================================================================

print("="*60)
print("APPLYING BUSINESS CONSTRAINTS")
print("="*60)

# Constraint 1: Minimum margin (price >= cost * 1.15)
df_pricing = df_pricing.withColumn(
    "min_price",
    F.col("Cost") * (1 + MIN_MARGIN_PCT)
)

# Constraint 2: Maximum price (cannot exceed MSRP)
df_pricing = df_pricing.withColumn(
    "max_price",
    F.col("MSRP") * MAX_PRICE_FACTOR
)

# Apply min/max constraints
df_pricing = df_pricing.withColumn(
    "recommended_price",
    F.greatest(
        F.col("min_price"),
        F.least(F.col("recommended_price_raw"), F.col("max_price"))
    )
)

# Track constraint violations
df_pricing = df_pricing.withColumn(
    "hit_min_margin",
    F.col("recommended_price_raw") < F.col("min_price")
)

df_pricing = df_pricing.withColumn(
    "hit_max_price",
    F.col("recommended_price_raw") > F.col("max_price")
)

# Add constraint violation to reason codes
df_pricing = df_pricing.withColumn(
    "reason_codes",
    F.when(
        F.col("hit_min_margin"),
        F.array_union(F.col("reason_codes"), F.array(F.lit("CONSTRAINT_MIN_MARGIN")))
    ).otherwise(F.col("reason_codes"))
)

df_pricing = df_pricing.withColumn(
    "reason_codes",
    F.when(
        F.col("hit_max_price"),
        F.array_union(F.col("reason_codes"), F.array(F.lit("CONSTRAINT_MAX_PRICE")))
    ).otherwise(F.col("reason_codes"))
)

# Constraint 3: Max change per week (10%)
if df_previous_prices is not None:
    df_prev = df_previous_prices.select(
        "product_id",
        F.col("recommended_price").alias("previous_price"),
        F.col("recommendation_ts").alias("previous_ts")
    )
    
    df_pricing = df_pricing.join(df_prev, "product_id", "left")
    
    df_pricing = df_pricing.withColumn(
        "days_since_last_change",
        F.datediff(F.current_timestamp(), F.col("previous_ts"))
    )
    
    # Check if change exceeds 10% and duration < 7 days
    df_pricing = df_pricing.withColumn(
        "change_pct",
        F.abs((F.col("recommended_price") - F.col("previous_price")) / F.col("previous_price"))
    )
    
    df_pricing = df_pricing.withColumn(
        "violates_max_change",
        (F.col("change_pct") > MAX_CHANGE_PCT_WEEKLY) & 
        (F.col("days_since_last_change") < MIN_DURATION_DAYS)
    )
    
    # If violates, keep previous price
    df_pricing = df_pricing.withColumn(
        "recommended_price",
        F.when(
            F.col("violates_max_change") & F.col("previous_price").isNotNull(),
            F.col("previous_price")
        ).otherwise(F.col("recommended_price"))
    )
    
    df_pricing = df_pricing.withColumn(
        "reason_codes",
        F.when(
            F.col("violates_max_change"),
            F.array_union(F.col("reason_codes"), F.array(F.lit("CONSTRAINT_MAX_CHANGE")))
        ).otherwise(F.col("reason_codes"))
    )
else:
    df_pricing = df_pricing.withColumn("previous_price", F.lit(None).cast("decimal(10,2)"))
    df_pricing = df_pricing.withColumn("previous_ts", F.lit(None).cast("timestamp"))
    df_pricing = df_pricing.withColumn("days_since_last_change", F.lit(None).cast("int"))
    df_pricing = df_pricing.withColumn("change_pct", F.lit(None).cast("decimal(5,2)"))
    df_pricing = df_pricing.withColumn("violates_max_change", F.lit(False))

print(f"Constraints applied to {df_pricing.count()} products")
print()

In [None]:
# =============================================================================
# PHASE 2: CALCULATE REVENUE IMPACT AND CONFIDENCE
# =============================================================================

print("="*60)
print("PHASE 2: CALCULATING REVENUE IMPACT AND CONFIDENCE")
print("="*60)

# Calculate expected revenue impact
# Formula: revenue_impact = (new_price - old_price) * expected_quantity
# where expected_quantity considers elasticity if available

df_pricing = df_pricing.withColumn(
    "price_change_pct",
    F.round((F.col("recommended_price") - F.col("SalePrice")) / F.col("SalePrice") * 100, 2)
)

# Estimate quantity change using elasticity
df_pricing = df_pricing.withColumn(
    "expected_quantity_change_pct",
    F.when(
        F.col("elasticity_coefficient").isNotNull(),
        F.col("elasticity_coefficient") * F.col("price_change_pct")
    ).otherwise(F.lit(0.0))  # Assume no quantity change if no elasticity data
)

df_pricing = df_pricing.withColumn(
    "expected_daily_quantity",
    F.col("avg_daily_sales") * (1 + F.col("expected_quantity_change_pct") / 100)
)

# Calculate daily revenue impact
df_pricing = df_pricing.withColumn(
    "current_daily_revenue",
    F.col("SalePrice") * F.col("avg_daily_sales")
)

df_pricing = df_pricing.withColumn(
    "expected_daily_revenue",
    F.col("recommended_price") * F.col("expected_daily_quantity")
)

df_pricing = df_pricing.withColumn(
    "daily_revenue_impact",
    F.col("expected_daily_revenue") - F.col("current_daily_revenue")
)

# Calculate 30-day revenue impact projection
df_pricing = df_pricing.withColumn(
    "projected_revenue_impact_30d",
    F.col("daily_revenue_impact") * 30
)

# Determine confidence level
df_pricing = df_pricing.withColumn(
    "confidence_level",
    F.when(
        (F.col("n_observations") >= 50) & 
        (F.col("elasticity_coefficient").isNotNull()) &
        (F.col("avg_daily_sales") >= 1.0),
        "HIGH"
    ).when(
        (F.col("n_observations") >= MIN_OBSERVATIONS_FOR_ELASTICITY) & 
        (F.col("elasticity_coefficient").isNotNull()),
        "MEDIUM"
    ).otherwise("LOW")
)

# Calculate confidence score (0-1)
df_pricing = df_pricing.withColumn(
    "confidence_score",
    F.when(
        F.col("confidence_level") == "HIGH", F.lit(0.85)
    ).when(
        F.col("confidence_level") == "MEDIUM", F.lit(0.60)
    ).otherwise(F.lit(0.35))
)

# Adjust confidence based on elasticity CI width
df_pricing = df_pricing.withColumn(
    "elasticity_ci_width",
    F.when(
        (F.col("elasticity_ci_upper").isNotNull()) & (F.col("elasticity_ci_lower").isNotNull()),
        F.col("elasticity_ci_upper") - F.col("elasticity_ci_lower")
    ).otherwise(F.lit(None))
)

df_pricing = df_pricing.withColumn(
    "confidence_score",
    F.when(
        (F.col("elasticity_ci_width").isNotNull()) & (F.col("elasticity_ci_width") > 1.0),
        F.col("confidence_score") * 0.9  # Reduce confidence for wide CI
    ).otherwise(F.col("confidence_score"))
)

print(f"Revenue impact and confidence calculated")
print()

In [None]:
# =============================================================================
# CREATE OUTPUT TABLE
# =============================================================================

print("="*60)
print("CREATING PRICING RECOMMENDATIONS OUTPUT")
print("="*60)

# Add no-change reason for products that don't need markdown
df_pricing = df_pricing.withColumn(
    "reason_codes",
    F.when(
        F.size(F.col("reason_codes")) == 0,
        F.array(F.lit("NO_CHANGE"))
    ).otherwise(F.col("reason_codes"))
)

# Create final output with Phase 2 enhancements
df_output = df_pricing.select(
    "product_id",
    "ProductName",
    "Department",
    "Category",
    "Subcategory",
    F.col("Cost").alias("cost"),
    F.col("MSRP").alias("msrp"),
    F.col("SalePrice").alias("current_price"),
    F.col("recommended_price"),
    F.col("price_change_pct").alias("change_pct"),
    F.col("reason_codes"),
    F.col("markdown_factor"),
    F.col("total_inventory"),
    F.col("days_since_last_receipt"),
    F.col("avg_daily_sales"),
    F.col("days_of_inventory"),
    F.col("hit_min_margin"),
    F.col("hit_max_price"),
    F.col("violates_max_change"),
    # Phase 2 fields
    F.col("elasticity_coefficient"),
    F.col("elasticity_category"),
    F.col("projected_revenue_impact_30d"),
    F.col("confidence_level"),
    F.col("confidence_score"),
    F.lit("ELASTICITY_OPTIMIZED").alias("model_type"),
    F.col("confidence_score").alias("ml_confidence"),
    F.current_timestamp().alias("recommendation_ts"),
    F.lit("2.0").alias("schema_version")
)

save_gold(df_output, "gold_pricing_recommendations")
print()

In [None]:
# =============================================================================
# SUMMARY STATISTICS
# =============================================================================

print("="*60)
print("PRICING RECOMMENDATIONS SUMMARY")
print("="*60)

total_products = df_output.count()
print(f"\nTotal products analyzed: {total_products}")

# Count by reason code
print("\nRecommendations by reason:")
df_reasons = df_output.select(F.explode("reason_codes").alias("reason"))
df_reasons.groupBy("reason").count().orderBy(F.desc("count")).show(truncate=False)

# Count constraint violations
print("\nConstraint violations:")
min_margin_hits = df_output.filter(F.col("hit_min_margin")).count()
max_price_hits = df_output.filter(F.col("hit_max_price")).count()
max_change_hits = df_output.filter(F.col("violates_max_change")).count()

print(f"  Min margin: {min_margin_hits} products")
print(f"  Max price: {max_price_hits} products")
print(f"  Max change/week: {max_change_hits} products")

# Price change distribution
print("\nPrice change distribution:")
df_output.select(
    F.avg("change_pct").alias("avg_change_pct"),
    F.min("change_pct").alias("min_change_pct"),
    F.max("change_pct").alias("max_change_pct")
).show()

# Products with significant markdowns
significant_markdowns = df_output.filter(F.col("change_pct") < -5).count()
print(f"\nProducts with >5% markdown: {significant_markdowns}")

# Phase 2: Elasticity-based recommendations
print("\n" + "="*60)
print("PHASE 2: ELASTICITY-BASED OPTIMIZATION SUMMARY")
print("="*60)

elasticity_used = df_output.filter(F.col("elasticity_coefficient").isNotNull()).count()
print(f"\nProducts with elasticity data: {elasticity_used} ({elasticity_used*100/total_products:.1f}%)")

print("\nConfidence level distribution:")
df_output.groupBy("confidence_level").count().orderBy(F.desc("count")).show(truncate=False)

print("\nElasticity category distribution:")
df_output.filter(F.col("elasticity_category").isNotNull()).groupBy("elasticity_category").count().orderBy(F.desc("count")).show(truncate=False)

print("\nRevenue impact summary (30-day projection):")
df_output.select(
    F.sum("projected_revenue_impact_30d").alias("total_revenue_impact"),
    F.avg("projected_revenue_impact_30d").alias("avg_revenue_impact_per_product"),
    F.max("projected_revenue_impact_30d").alias("max_revenue_impact")
).show()

print("\nTop 10 products by expected revenue impact:")
df_output.orderBy(F.desc("projected_revenue_impact_30d")).select(
    "product_id",
    "ProductName",
    "current_price",
    "recommended_price",
    "change_pct",
    "projected_revenue_impact_30d",
    "confidence_level"
).show(10, truncate=False)

print("\n" + "="*60)
print("DYNAMIC PRICING PHASE 2 COMPLETE")
print("="*60)
print("\nCompleted features:")
print("  ✓ Phase 1: Rule-based markdown engine")
print("  ✓ Phase 2: Price elasticity integration")
print("  ✓ Phase 2: Elasticity-based optimization")
print("  ✓ Phase 2: Inventory urgency adjustments")
print("  ✓ Phase 2: Revenue impact estimation")
print("  ✓ Phase 2: Confidence level scoring")
print("\nNext steps:")
print("  - Review recommendations in gold_pricing_recommendations table")
print("  - Monitor revenue impact vs actuals")
print("  - Phase 3: Add competitor price intelligence")
print("  - Phase 3: Add multi-product bundle optimization")