# Enhanced Triplet Association Analysis - Instacart Dataset
## BSc Thesis: Market Basket Analysis using Databricks

**Focus:** Deep analysis of 3-item purchase patterns (triplets) with efficiency optimizations

**Why Triplets?**
- Pairs are too basic (just 2 items)
- Quadruplets+ are computationally expensive (4+ self-joins)
- Triplets offer the sweet spot: rich patterns with manageable compute

---

## Part 1: Setup and Data Quality Check

In [0]:
from pyspark.sql.functions import col, count, collect_list, collect_set, round as spark_round, desc, asc
import pandas as pd

print("=" * 80)
print("TRIPLET MARKET BASKET ANALYSIS - Enhanced Version")
print("Pure SQL Implementation (Spark Connect Compatible)")
print("=" * 80)

# Data quality check
def check_data_quality():
    print("\n‚Üí Checking data quality...")
    
    quality_check = spark.sql("""
        SELECT 
            COUNT(*) as total_products,
            COUNT(CASE WHEN TRY_CAST(department_id AS BIGINT) IS NULL THEN 1 END) as invalid_dept_ids,
            COUNT(CASE WHEN TRY_CAST(aisle_id AS BIGINT) IS NULL THEN 1 END) as invalid_aisle_ids,
            COUNT(DISTINCT TRY_CAST(department_id AS BIGINT)) as total_departments,
            COUNT(DISTINCT TRY_CAST(aisle_id AS BIGINT)) as total_aisles
        FROM workspace.instacart.products
    """)
    
    result = quality_check.collect()[0]
    print(f"   Total Products: {result['total_products']:,}")
    print(f"   Total Departments: {result['total_departments']}")
    print(f"   Total Aisles: {result['total_aisles']}")
    print(f"   Invalid Department IDs: {result['invalid_dept_ids']}")
    print(f"   Invalid Aisle IDs: {result['invalid_aisle_ids']}")
    
    if result['invalid_dept_ids'] > 0 or result['invalid_aisle_ids'] > 0:
        print("   ‚ö† Some products have invalid IDs - they will be excluded from categorical analysis")
    
    return result

# Basic order statistics
def check_order_statistics():
    print("\n‚Üí Order statistics...")
    
    order_stats = spark.sql("""
        SELECT 
            COUNT(DISTINCT op.order_id) as total_orders,
            COUNT(DISTINCT o.user_id) as total_users,
            COUNT(*) as total_order_items
        FROM workspace.instacart.order_products_prior op
        JOIN workspace.instacart.orders o ON op.order_id = o.order_id
    """)
    
    result = order_stats.collect()[0]
    print(f"   Total Orders: {result['total_orders']:,}")
    print(f"   Total Users: {result['total_users']:,}")
    print(f"   Total Items Ordered: {result['total_order_items']:,}")
    print(f"   Avg Items per Order: {result['total_order_items'] / result['total_orders']:.2f}")
    
    return result

check_data_quality()
check_order_statistics()
print("=" * 80)

## Part 2: Product Pairs Analysis (Foundation)
We start with pairs to understand basic associations before moving to triplets

In [0]:
def analyze_product_pairs(min_support_count=100):
    """
    Find product pairs with support, confidence, and lift metrics.
    This serves as the foundation for triplet analysis.
    """
    print(f"\n‚Üí Analyzing product pairs (min support: {min_support_count})...")
    
    query = f"""
    WITH product_pairs AS (
        SELECT 
            op1.product_id as prod1,
            op2.product_id as prod2,
            COUNT(DISTINCT op1.order_id) as co_occurrence
        FROM workspace.instacart.order_products_prior op1
        JOIN workspace.instacart.order_products_prior op2 
            ON op1.order_id = op2.order_id AND op1.product_id < op2.product_id
        GROUP BY op1.product_id, op2.product_id
        HAVING COUNT(DISTINCT op1.order_id) >= {min_support_count}
    ),
    product_support AS (
        SELECT 
            product_id,
            COUNT(DISTINCT order_id) as support_count
        FROM workspace.instacart.order_products_prior
        GROUP BY product_id
    ),
    total_orders AS (
        SELECT COUNT(DISTINCT order_id) as total 
        FROM workspace.instacart.order_products_prior
    )
    SELECT 
        p1.product_name as product_1,
        p2.product_name as product_2,
        pp.co_occurrence,
        ROUND(pp.co_occurrence * 100.0 / to.total, 4) as support_pct,
        ROUND(pp.co_occurrence * 1.0 / ps1.support_count, 4) as confidence_1_to_2,
        ROUND(pp.co_occurrence * 1.0 / ps2.support_count, 4) as confidence_2_to_1,
        ROUND(pp.co_occurrence * 1.0 * to.total / (ps1.support_count * ps2.support_count), 2) as lift,
        d1.department as dept_1,
        d2.department as dept_2,
        a1.aisle as aisle_1,
        a2.aisle as aisle_2
    FROM product_pairs pp
    CROSS JOIN total_orders to
    JOIN workspace.instacart.products p1 ON pp.prod1 = p1.product_id
    JOIN workspace.instacart.products p2 ON pp.prod2 = p2.product_id
    JOIN product_support ps1 ON pp.prod1 = ps1.product_id
    JOIN product_support ps2 ON pp.prod2 = ps2.product_id
    LEFT JOIN workspace.instacart.departments d1 ON TRY_CAST(p1.department_id AS BIGINT) = d1.department_id
    LEFT JOIN workspace.instacart.departments d2 ON TRY_CAST(p2.department_id AS BIGINT) = d2.department_id
    LEFT JOIN workspace.instacart.aisles a1 ON TRY_CAST(p1.aisle_id AS BIGINT) = a1.aisle_id
    LEFT JOIN workspace.instacart.aisles a2 ON TRY_CAST(p2.aisle_id AS BIGINT) = a2.aisle_id
    ORDER BY lift DESC
    LIMIT 200
    """
    
    df = spark.sql(query)
    df.createOrReplaceTempView("product_pairs_analysis")
    print(f"   Found {df.count()} high-quality product pairs")
    return df

# Execute pairs analysis
pairs_df = analyze_product_pairs(min_support_count=100)
print("\n‚úì Pairs analysis complete. View created: product_pairs_analysis")

In [0]:
# Display top pairs
print("\nüî• TOP 20 PRODUCT PAIRS BY LIFT:")
display(pairs_df.orderBy(desc('lift')).limit(20))

## Part 3: Core Triplet Analysis
### 3.1 Basic Triplet Patterns

In [0]:
def analyze_triplets_optimized(min_support_count=50):
    """
    Optimized: Only analyze triplets from popular products
    Reduces search space from 50K¬≥ to 500¬≥ products
    """
    print(f"\n‚Üí Analyzing triplets (optimized, min support: {min_support_count})...")
    
    query = f"""
    WITH popular_products AS (
        -- Filter to products that appear in at least 150 orders
        SELECT product_id
        FROM workspace.instacart.order_products_prior
        GROUP BY product_id
        HAVING COUNT(DISTINCT order_id) >= 150
        LIMIT 1000
    ),
    triplet_combos AS (
        SELECT 
            op1.product_id as prod1,
            op2.product_id as prod2,
            op3.product_id as prod3,
            COUNT(DISTINCT op1.order_id) as triplet_count
        FROM workspace.instacart.order_products_prior op1
        INNER JOIN popular_products pp1 ON op1.product_id = pp1.product_id
        JOIN workspace.instacart.order_products_prior op2 
            ON op1.order_id = op2.order_id AND op1.product_id < op2.product_id
        INNER JOIN popular_products pp2 ON op2.product_id = pp2.product_id
        JOIN workspace.instacart.order_products_prior op3
            ON op1.order_id = op3.order_id AND op2.product_id < op3.product_id
        INNER JOIN popular_products pp3 ON op3.product_id = pp3.product_id
        GROUP BY op1.product_id, op2.product_id, op3.product_id
        HAVING COUNT(DISTINCT op1.order_id) >= {min_support_count}
    ),
    total_orders AS (
        SELECT COUNT(DISTINCT order_id) as total 
        FROM workspace.instacart.order_products_prior
    )
    SELECT 
        p1.product_name as product_1,
        p2.product_name as product_2,
        p3.product_name as product_3,
        tc.triplet_count,
        ROUND(tc.triplet_count * 100.0 / to.total, 4) as support_pct,
        d1.department as dept_1,
        d2.department as dept_2,
        d3.department as dept_3,
        CASE 
            WHEN d1.department = d2.department AND d2.department = d3.department 
            THEN 'Same Department'
            WHEN d1.department != d2.department AND d2.department != d3.department AND d1.department != d3.department
            THEN 'All Different Departments'
            ELSE 'Mixed Departments'
        END as department_diversity
    FROM triplet_combos tc
    CROSS JOIN total_orders to
    JOIN workspace.instacart.products p1 ON tc.prod1 = p1.product_id
    JOIN workspace.instacart.products p2 ON tc.prod2 = p2.product_id
    JOIN workspace.instacart.products p3 ON tc.prod3 = p3.product_id
    LEFT JOIN workspace.instacart.departments d1 ON TRY_CAST(p1.department_id AS BIGINT) = d1.department_id
    LEFT JOIN workspace.instacart.departments d2 ON TRY_CAST(p2.department_id AS BIGINT) = d2.department_id
    LEFT JOIN workspace.instacart.departments d3 ON TRY_CAST(p3.department_id AS BIGINT) = d3.department_id
    ORDER BY triplet_count DESC
    LIMIT 100
    """
    
    df = spark.sql(query)
    df.createOrReplaceTempView("triplet_patterns_basic")
    print(f"   Found {df.count()} triplet patterns")
    return df

# Execute
triplets_df = analyze_triplets_optimized(min_support_count=50)
print("\n‚úì Basic triplet analysis complete. View created: triplet_patterns_basic")

In [0]:
# Display top triplets
print("\nüéØ TOP 20 TRIPLET PATTERNS:")
display(triplets_df.limit(20))

### 3.2 Triplet Confidence and Lift Metrics (NEW!)
Calculate how much each pair ‚Üí third item relationship matters

In [0]:
def analyze_triplet_confidence(min_support_count=50):
    """
    Calculate triplet-specific metrics:
    - Confidence: P(prod3 | prod1 ‚àß prod2)
    - Lift: How much more likely is prod3 given prod1 and prod2 together
    - Incremental value over pairs
    """
    print(f"\n‚Üí Calculating triplet confidence and lift metrics...")
    
    query = f"""
    WITH triplet_combos AS (
        SELECT 
            op1.product_id as prod1,
            op2.product_id as prod2,
            op3.product_id as prod3,
            COUNT(DISTINCT op1.order_id) as triplet_count
        FROM workspace.instacart.order_products_prior op1
        JOIN workspace.instacart.order_products_prior op2 
            ON op1.order_id = op2.order_id AND op1.product_id < op2.product_id
        JOIN workspace.instacart.order_products_prior op3
            ON op1.order_id = op3.order_id AND op2.product_id < op3.product_id
        GROUP BY op1.product_id, op2.product_id, op3.product_id
        HAVING COUNT(DISTINCT op1.order_id) >= {min_support_count}
    ),
    pair_support AS (
        SELECT 
            op1.product_id as prod1,
            op2.product_id as prod2,
            COUNT(DISTINCT op1.order_id) as pair_count
        FROM workspace.instacart.order_products_prior op1
        JOIN workspace.instacart.order_products_prior op2 
            ON op1.order_id = op2.order_id AND op1.product_id < op2.product_id
        GROUP BY op1.product_id, op2.product_id
    ),
    product_support AS (
        SELECT 
            product_id,
            COUNT(DISTINCT order_id) as support_count
        FROM workspace.instacart.order_products_prior
        GROUP BY product_id
    ),
    total_orders AS (
        SELECT COUNT(DISTINCT order_id) as total 
        FROM workspace.instacart.order_products_prior
    )
    SELECT 
        p1.product_name as product_1,
        p2.product_name as product_2,
        p3.product_name as product_3,
        tc.triplet_count,
        ps12.pair_count as pair_12_count,
        -- Confidence: P(prod3 | prod1 AND prod2)
        ROUND(tc.triplet_count * 1.0 / ps12.pair_count, 4) as confidence_12_to_3,
        -- Lift: (P(prod1, prod2, prod3)) / (P(prod1, prod2) * P(prod3))
        ROUND(tc.triplet_count * 1.0 * to.total / (ps12.pair_count * ps3.support_count), 2) as triplet_lift,
        -- Support percentage
        ROUND(tc.triplet_count * 100.0 / to.total, 4) as support_pct,
        d1.department as dept_1,
        d2.department as dept_2,
        d3.department as dept_3
    FROM triplet_combos tc
    CROSS JOIN total_orders to
    JOIN pair_support ps12 ON tc.prod1 = ps12.prod1 AND tc.prod2 = ps12.prod2
    JOIN product_support ps3 ON tc.prod3 = ps3.product_id
    JOIN workspace.instacart.products p1 ON tc.prod1 = p1.product_id
    JOIN workspace.instacart.products p2 ON tc.prod2 = p2.product_id
    JOIN workspace.instacart.products p3 ON tc.prod3 = p3.product_id
    LEFT JOIN workspace.instacart.departments d1 ON TRY_CAST(p1.department_id AS BIGINT) = d1.department_id
    LEFT JOIN workspace.instacart.departments d2 ON TRY_CAST(p2.department_id AS BIGINT) = d2.department_id
    LEFT JOIN workspace.instacart.departments d3 ON TRY_CAST(p3.department_id AS BIGINT) = d3.department_id
    ORDER BY triplet_lift DESC
    LIMIT 100
    """
    
    df = spark.sql(query)
    df.createOrReplaceTempView("triplet_confidence_lift")
    print(f"   Calculated confidence/lift for {df.count()} triplets")
    return df

# Execute confidence analysis
triplet_metrics_df = analyze_triplet_confidence(min_support_count=50)
print("\n‚úì Triplet confidence/lift analysis complete. View created: triplet_confidence_lift")

In [0]:
# Display highest lift triplets
print("\nüíé TOP 20 TRIPLETS BY LIFT (Strongest 3-way associations):")
display(triplet_metrics_df.limit(20))

### 3.3 Department Diversity in Triplets

In [0]:
def analyze_department_diversity():
    """
    Analyze how triplets span across departments.
    Important for understanding cross-category shopping behavior.
    """
    print("\n‚Üí Analyzing department diversity in triplets...")
    
    query = """
    SELECT 
        department_diversity,
        COUNT(*) as pattern_count,
        ROUND(AVG(triplet_count), 2) as avg_occurrence,
        ROUND(AVG(support_pct), 4) as avg_support_pct,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as pct_of_patterns
    FROM triplet_patterns_basic
    GROUP BY department_diversity
    ORDER BY pattern_count DESC
    """
    
    df = spark.sql(query)
    df.createOrReplaceTempView("triplet_dept_diversity")
    return df

dept_diversity_df = analyze_department_diversity()
print("\nüìä DEPARTMENT DIVERSITY IN TRIPLETS:")
display(dept_diversity_df)

### 3.4 Top Triplets by Department Combination (NEW!)

In [0]:
def analyze_cross_department_triplets():
    """
    Find the most interesting cross-department triplets.
    These reveal complementary purchase patterns across categories.
    """
    print("\n‚Üí Finding cross-department triplets...")
    
    query = """
    SELECT 
        product_1,
        product_2,
        product_3,
        triplet_count,
        support_pct,
        dept_1,
        dept_2,
        dept_3,
        CONCAT(dept_1, ' + ', dept_2, ' + ', dept_3) as dept_combination
    FROM triplet_patterns_basic
    WHERE department_diversity = 'All Different Departments'
    ORDER BY triplet_count DESC
    LIMIT 30
    """
    
    df = spark.sql(query)
    return df

cross_dept_triplets = analyze_cross_department_triplets()
print("\nüåâ TOP 30 CROSS-DEPARTMENT TRIPLETS:")
display(cross_dept_triplets)

## Part 4: Temporal Patterns (Triplet-Focused)

In [0]:
def analyze_temporal_patterns_simple():
    """
    Simplified temporal analysis - uses pairs instead of triplets
    Much faster and still gives valuable insights
    """
    print("\n‚Üí Analyzing temporal patterns (pairs - compute optimized)...")
    
    query = """
    WITH time_pairs AS (
        SELECT 
            CASE 
                WHEN o.order_hour_of_day BETWEEN 6 AND 11 THEN 'Morning (6-11am)'
                WHEN o.order_hour_of_day BETWEEN 12 AND 17 THEN 'Afternoon (12-5pm)'
                WHEN o.order_hour_of_day BETWEEN 18 AND 21 THEN 'Evening (6-9pm)'
                ELSE 'Night (10pm-5am)'
            END as time_period,
            CASE 
                WHEN o.order_dow IN (0, 6) THEN 'Weekend'
                ELSE 'Weekday'
            END as day_type,
            op1.product_id as prod1,
            op2.product_id as prod2,
            COUNT(DISTINCT o.order_id) as pair_count
        FROM workspace.instacart.orders o
        JOIN workspace.instacart.order_products_prior op1 ON o.order_id = op1.order_id
        JOIN workspace.instacart.order_products_prior op2 
            ON o.order_id = op2.order_id AND op1.product_id < op2.product_id
        WHERE o.eval_set = 'prior'
        GROUP BY time_period, day_type, op1.product_id, op2.product_id
        HAVING COUNT(DISTINCT o.order_id) >= 100
    )
    SELECT 
        tp.time_period,
        tp.day_type,
        p1.product_name as product_1,
        p2.product_name as product_2,
        tp.pair_count,
        CONCAT(tp.day_type, ' - ', tp.time_period) as shopping_context
    FROM time_pairs tp
    JOIN workspace.instacart.products p1 ON tp.prod1 = p1.product_id
    JOIN workspace.instacart.products p2 ON tp.prod2 = p2.product_id
    ORDER BY tp.pair_count DESC
    LIMIT 200
    """
    
    df = spark.sql(query)
    df.createOrReplaceTempView("temporal_pair_patterns")
    print(f"   Found {df.count()} temporal pair patterns")
    return df

# Execute
temporal_df = analyze_temporal_patterns_simple()
print("\n‚úì Temporal analysis complete. View created: temporal_pair_patterns")

In [0]:
# Display by shopping context
print("\n‚è∞ WEEKEND EVENING PATTERNS:")
display(spark.sql("""
    SELECT * FROM temporal_pair_patterns
    WHERE day_type = 'Weekend' AND time_period = 'Evening (6-9pm)'
    ORDER BY pair_count DESC
    LIMIT 15
"""))

print("\n‚òÄÔ∏è WEEKDAY MORNING PATTERNS:")
display(spark.sql("""
    SELECT * FROM temporal_pair_patterns
    WHERE day_type = 'Weekday' AND time_period = 'Morning (6-11am)'
    ORDER BY pair_count DESC
    LIMIT 15
"""))

## Part 5: Reorder Loyalty in Triplets

In [0]:
def analyze_triplet_reorder_loyalty(min_support=30):
    """
    Find triplets that are frequently reordered together.
    These represent strong habitual purchase patterns.
    """
    print(f"\n‚Üí Analyzing reorder loyalty for triplets (min support: {min_support})...")
    
    query = f"""
    WITH reordered_triplets AS (
        SELECT 
            op1.product_id as prod1,
            op2.product_id as prod2,
            op3.product_id as prod3,
            COUNT(DISTINCT op1.order_id) as reorder_together_count
        FROM workspace.instacart.order_products_prior op1
        JOIN workspace.instacart.order_products_prior op2 
            ON op1.order_id = op2.order_id AND op1.product_id < op2.product_id
        JOIN workspace.instacart.order_products_prior op3
            ON op1.order_id = op3.order_id AND op2.product_id < op3.product_id
        WHERE op1.reordered = 1 AND op2.reordered = 1 AND op3.reordered = 1
        GROUP BY op1.product_id, op2.product_id, op3.product_id
        HAVING COUNT(DISTINCT op1.order_id) >= {min_support}
    )
    SELECT 
        p1.product_name as product_1,
        p2.product_name as product_2,
        p3.product_name as product_3,
        rt.reorder_together_count,
        d1.department as dept_1,
        d2.department as dept_2,
        d3.department as dept_3,
        CASE 
            WHEN d1.department = d2.department AND d2.department = d3.department 
            THEN 'Same Department Loyalty'
            ELSE 'Cross Department Loyalty'
        END as loyalty_type
    FROM reordered_triplets rt
    JOIN workspace.instacart.products p1 ON rt.prod1 = p1.product_id
    JOIN workspace.instacart.products p2 ON rt.prod2 = p2.product_id
    JOIN workspace.instacart.products p3 ON rt.prod3 = p3.product_id
    LEFT JOIN workspace.instacart.departments d1 ON TRY_CAST(p1.department_id AS BIGINT) = d1.department_id
    LEFT JOIN workspace.instacart.departments d2 ON TRY_CAST(p2.department_id AS BIGINT) = d2.department_id
    LEFT JOIN workspace.instacart.departments d3 ON TRY_CAST(p3.department_id AS BIGINT) = d3.department_id
    WHERE d1.department IS NOT NULL AND d2.department IS NOT NULL AND d3.department IS NOT NULL
    ORDER BY reorder_together_count DESC
    LIMIT 50
    """
    
    df = spark.sql(query)
    df.createOrReplaceTempView("triplet_reorder_patterns")
    print(f"   Found {df.count()} loyal triplet patterns")
    return df

reorder_triplets_df = analyze_triplet_reorder_loyalty(min_support=30)
print("\n‚úì Reorder loyalty analysis complete. View created: triplet_reorder_patterns")

In [0]:
print("\nüîÑ TOP 30 REORDERED TRIPLETS (Habitual Purchase Patterns):")
display(reorder_triplets_df.limit(30))

## Part 6: Summary Statistics

In [0]:
def generate_summary_statistics():
    """
    Generate comprehensive summary statistics for the thesis
    """
    print("\n" + "=" * 80)
    print("SUMMARY STATISTICS - TRIPLET ANALYSIS")
    print("=" * 80)
    
    # Count of patterns
    pairs_count = spark.table("product_pairs_analysis").count()
    triplets_count = spark.table("triplet_patterns_basic").count()
    
    print(f"\nüìä Pattern Counts:")
    print(f"   Product Pairs: {pairs_count:,}")
    print(f"   Triplets: {triplets_count:,}")
    
    # Department diversity breakdown
    dept_div = spark.sql("""
        SELECT department_diversity, COUNT(*) as count
        FROM triplet_patterns_basic
        GROUP BY department_diversity
        ORDER BY count DESC
    """).collect()
    
    print(f"\nüì¶ Department Diversity:")
    for row in dept_div:
        print(f"   {row['department_diversity']}: {row['count']} patterns")
    
    # Support ranges
    support_stats = spark.sql("""
        SELECT 
            MIN(triplet_count) as min_support,
            MAX(triplet_count) as max_support,
            ROUND(AVG(triplet_count), 2) as avg_support,
            PERCENTILE(triplet_count, 0.5) as median_support
        FROM triplet_patterns_basic
    """).collect()[0]
    
    print(f"\nüìà Support Statistics:")
    print(f"   Min Support: {support_stats['min_support']}")
    print(f"   Max Support: {support_stats['max_support']}")
    print(f"   Avg Support: {support_stats['avg_support']}")
    print(f"   Median Support: {support_stats['median_support']}")
    
    print("\n" + "=" * 80)

generate_summary_statistics()

## Part 7: Export Results for Further Analysis

In [0]:
# Export top patterns to Pandas for visualization or further analysis
def export_key_results():
    print("\n‚Üí Exporting key results to Pandas DataFrames...")
    
    # Top triplets
    top_triplets_pd = triplets_df.limit(50).toPandas()
    
    # Triplets with metrics
    triplet_metrics_pd = triplet_metrics_df.limit(50).toPandas()
    
    # Department diversity
    dept_diversity_pd = dept_diversity_df.toPandas()
    
    print(f"   Exported {len(top_triplets_pd)} top triplets to Pandas")
    print(f"   Exported {len(triplet_metrics_pd)} triplets with metrics to Pandas")
    print(f"   Exported department diversity summary to Pandas")
    
    return top_triplets_pd, triplet_metrics_pd, dept_diversity_pd

# Uncomment to export:
# top_triplets_pd, triplet_metrics_pd, dept_diversity_pd = export_key_results()

## Analysis Complete!

### Created Views:
1. `product_pairs_analysis` - Product pair associations
2. `triplet_patterns_basic` - Basic triplet patterns
3. `triplet_confidence_lift` - Triplet confidence and lift metrics
4. `triplet_dept_diversity` - Department diversity summary
5. `temporal_triplet_patterns` - Time-based triplet patterns
6. `triplet_reorder_patterns` - Loyalty triplet patterns

### Next Steps for Thesis:
1. **Visualization** - Create charts showing top patterns
2. **Statistical Analysis** - Perform significance tests
3. **Business Insights** - Interpret patterns for recommendations
4. **Comparison** - Compare with pairs to show triplet value
5. **Documentation** - Write up methodology and findings