# D11 Bending Stiffness Comparison - TEST RUN (1000 Pits)

**This is a TEST version that processes only 1,000 pits (instead of ~50,000)**

This notebook compares D11 (bending stiffness) values calculated via all possible parameterization pathways for ECTP slabs in the snow pilot dataset.

## Test Configuration

- **Pits to process**: 1,000 (randomly sampled)
- **Expected ECTP slabs**: ~300-400 (based on 24.6% ECTP rate)
- **Expected pathway executions**: ~300 × 32 = ~9,600
- **Estimated runtime**: 2-4 minutes

## Goals

1. Execute all pathways for each ECTP slab
2. Analyze data loss along different pathways
3. Compare D11 statistics by pathway
4. Compare D11 statistics across pathways (per slab)
5. Identify sources of variability

## D11 Calculation

D11 (bending stiffness) requires:
- **Density** (ρ) for elastic modulus calculation (4 methods)
- **Elastic modulus** (E) on all layers (4 methods)
- **Poisson's ratio** (ν) on all layers (2 methods)
- **Layer positions** (depth_top, thickness) - already available

Number of pathways = (# density methods) × (# E methods) × (# ν methods) = **32 unique pathways**

Note: Poisson's ratio (Srivastava method) uses hand hardness + grain form directly, NOT calculated density.

## 1. Setup and Imports

In [None]:
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import warnings
import time
warnings.filterwarnings('ignore')

# Add src to path if needed
sys.path.insert(0, str(Path.cwd().parent / 'src'))

from snowpyt_mechparams import ExecutionEngine
from snowpyt_mechparams.graph import graph
from snowpyt_mechparams.algorithm import find_parameterizations
from snowpyt_mechparams.snowpilot_utils import parse_caaml_file
from snowpyt_mechparams.data_structures import Pit

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 8)

print("✓ Imports successful")
print(f"\n⚠️  TEST MODE: Will process only 1,000 pits")

## 2. Verify Graph Structure and Pathway Count

In [None]:
# Check that D11 node exists
D11_node = graph.get_node("D11")
print(f"D11 node: {D11_node}")
print(f"D11 node type: {D11_node.type}")

# Find all pathways to D11
all_D11_pathways = find_parameterizations(graph, D11_node)
print(f"\n✓ Total D11 calculation pathways: {len(all_D11_pathways)}")

# Show first 3 pathways as examples
print("\nExample pathways:")
for i, param in enumerate(all_D11_pathways[:3], 1):
    print(f"\n{i}. {param}")

## 3. Load ECTP Slabs from Dataset (TEST: 1000 pits)

In [None]:
# Set data directory path
data_dir = Path.cwd() / 'data'
print(f"Data directory: {data_dir}")
print(f"Exists: {data_dir.exists()}")

# Get all CAAML files
all_caaml_files = list(data_dir.glob('snowpits-*-caaml.xml'))
print(f"\nTotal CAAML files available: {len(all_caaml_files):,}")

# TEST MODE: Select only first 1000 files
TEST_SIZE = 1000
caaml_files = all_caaml_files[:TEST_SIZE]
print(f"\n⚠️  TEST MODE: Processing only {len(caaml_files):,} files")
print(f"   ({100 * len(caaml_files) / len(all_caaml_files):.1f}% of total dataset)")

In [None]:
# Parse CAAML files and create Pits
print("Parsing CAAML files and creating Pits...")
start_time = time.time()

pits = []
failed_files = []

for filepath in tqdm(caaml_files, desc="Processing files"):
    try:
        # Parse CAAML file to get snowpylot SnowPit
        snow_pit = parse_caaml_file(str(filepath))
        
        # Create Pit object
        pit = Pit.from_snow_pit(snow_pit)
        pits.append(pit)
        
    except Exception as e:
        failed_files.append((filepath.name, str(e)))

parse_time = time.time() - start_time

print(f"\n✓ Successfully parsed: {len(pits):,} pits")
print(f"  Failed: {len(failed_files):,} files")
print(f"  Time: {parse_time:.1f} seconds ({parse_time/len(pits):.3f} sec/pit)")

In [None]:
# Create slabs using ECTP failure layers
print("Creating slabs from ECTP failures...")
start_time = time.time()

all_slabs = []
pits_with_ectp = 0

for pit in tqdm(pits, desc="Creating slabs"):
    # Create slabs based on ECTP failures
    slabs = pit.create_slabs(weak_layer_def="ECTP_failure_layer")
    
    if slabs:
        pits_with_ectp += 1
        all_slabs.extend(slabs)

slab_time = time.time() - start_time

print(f"\n✓ Total pits processed: {len(pits):,}")
print(f"  Pits with ECTP failures: {pits_with_ectp:,} ({100*pits_with_ectp/len(pits):.1f}%)")
print(f"  Total ECTP slabs created: {len(all_slabs):,}")
print(f"  Time: {slab_time:.1f} seconds")

# Summary statistics
if len(all_slabs) > 0:
    slab_thicknesses = [slab.total_thickness for slab in all_slabs if slab.total_thickness is not None]
    num_layers = [len(slab.layers) for slab in all_slabs]
    
    print(f"\nSlab characteristics:")
    print(f"  Mean thickness: {np.mean(slab_thicknesses):.1f} cm")
    print(f"  Mean # layers: {np.mean(num_layers):.1f}")
    print(f"  Total pathway executions expected: {len(all_slabs) * len(all_D11_pathways):,}")

## 4. Execute All Pathways for D11

This section executes all 32 pathways for each slab. The execution engine uses **dynamic programming** to cache computed values across pathways, avoiding redundant calculations.

**Test configuration**: Processing ~300-400 slabs × 32 pathways = ~9,600-12,800 executions

In [None]:
# Initialize execution engine
engine = ExecutionEngine(graph)

print(f"Executing all pathways for {len(all_slabs):,} slabs...")
print(f"Total pathway executions: {len(all_slabs) * len(all_D11_pathways):,}")
print("\nNote: Dynamic programming caches computed values within each slab.")
print("This avoids redundant calculations when pathways share common sub-paths.")
print("\nStarting execution...")

In [None]:
# Execute all pathways for each slab
results_data = []
start_time = time.time()

for slab_idx, slab in enumerate(tqdm(all_slabs, desc="Executing pathways")):
    try:
        # Execute all pathways (uses dynamic programming internally)
        results = engine.execute_all(
            slab=slab,
            target_parameter='D11',
            include_plate_theory=True
        )

        # Record results for each pathway
        for pathway_desc, result in results.results.items():
            # Record even failed pathways (for data loss analysis)
            record = {
                'pit_id': slab.pit_id,
                'slab_id': slab.slab_id,
                'slab_index': slab_idx,
                'pathway_description': pathway_desc,
                'methods_used': str(result.methods_used),
                'success': result.success,
                'D11': result.slab_result.D11.nominal_value if (result.slab_result and result.slab_result.D11) else None,
                'D11_uncertainty': result.slab_result.D11.std_dev if (result.slab_result and result.slab_result.D11) else None,
                'num_layers': len(slab.layers),
                'slab_thickness_cm': slab.total_thickness,
                'slope_angle_deg': slab.angle,
            }

            # Add failure analysis data
            if not result.success or (result.slab_result and result.slab_result.D11 is None):
                # Determine why it failed
                # Count layers with missing elastic_modulus or poissons_ratio
                missing_E = sum(1 for lr in result.layer_results if lr.layer.elastic_modulus is None)
                missing_nu = sum(1 for lr in result.layer_results if lr.layer.poissons_ratio is None)
                missing_thickness = sum(1 for lr in result.layer_results if lr.layer.thickness is None)

                record['failure_reason'] = 'incomplete_layer_params'
                record['layers_missing_E'] = missing_E
                record['layers_missing_nu'] = missing_nu
                record['layers_missing_thickness'] = missing_thickness
                record['success'] = False

            results_data.append(record)

    except Exception as e:
        # Record complete failure
        results_data.append({
            'pit_id': slab.pit_id,
            'slab_id': slab.slab_id,
            'slab_index': slab_idx,
            'pathway_description': 'EXECUTION_ERROR',
            'success': False,
            'failure_reason': str(e),
        })

execution_time = time.time() - start_time

# Create DataFrame
df_results = pd.DataFrame(results_data)

print(f"\n{'='*60}")
print("EXECUTION COMPLETE")
print(f"{'='*60}")
print(f"Total pathway executions: {len(df_results):,}")
print(f"Successful calculations: {df_results['success'].sum():,}")
print(f"Failed calculations: {(~df_results['success']).sum():,}")
print(f"Success rate: {100 * df_results['success'].mean():.1f}%")
print(f"\nExecution time: {execution_time:.1f} seconds ({execution_time/60:.1f} minutes)")
print(f"Time per slab: {execution_time/len(all_slabs):.2f} seconds")
print(f"Time per pathway execution: {execution_time/len(df_results):.3f} seconds")

In [None]:
# Save raw results
output_file = 'D11_pathway_comparison_TEST_raw.csv'
df_results.to_csv(output_file, index=False)
print(f"\n✓ Raw results saved to: {output_file}")
print(f"  File size: {Path(output_file).stat().st_size / 1024:.1f} KB")

## 5. Data Loss Analysis by Pathway

Analyze which pathways have higher success rates and why others fail.

In [None]:
print("="*80)
print("DATA LOSS ANALYSIS BY PATHWAY")
print("="*80)

# Group by pathway
pathway_stats = df_results.groupby('pathway_description').agg({
    'success': ['sum', 'count'],
    'D11': 'count'
}).reset_index()

pathway_stats.columns = ['pathway', 'successful', 'total', 'D11_computed']
pathway_stats['success_rate_%'] = 100 * pathway_stats['successful'] / pathway_stats['total']
pathway_stats['failure_rate_%'] = 100 - pathway_stats['success_rate_%']

# Sort by success rate
pathway_stats_sorted = pathway_stats.sort_values('success_rate_%', ascending=False)

print("\nPathway Success Rates (Top 15):")
print(pathway_stats_sorted.head(15).to_string(index=False))

print("\n\nPathway Success Rates (Bottom 15):")
print(pathway_stats_sorted.tail(15).to_string(index=False))

In [None]:
# Visualize success rates
fig, ax = plt.subplots(figsize=(14, 10))
pathways_to_plot = pathway_stats_sorted.head(25)  # Top 25

y_pos = np.arange(len(pathways_to_plot))
colors = plt.cm.RdYlGn(pathways_to_plot['success_rate_%'] / 100)
ax.barh(y_pos, pathways_to_plot['success_rate_%'], alpha=0.7, color=colors)
ax.set_yticks(y_pos)
ax.set_yticklabels(pathways_to_plot['pathway'], fontsize=7)
ax.set_xlabel('Success Rate (%)', fontsize=12)
ax.set_title('D11 Calculation Success Rate by Pathway (Top 25) - TEST DATA', fontsize=14, fontweight='bold')
ax.grid(axis='x', alpha=0.3)
ax.invert_yaxis()
plt.tight_layout()
plt.savefig('D11_success_rates_by_pathway_TEST.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Figure saved: D11_success_rates_by_pathway_TEST.png")

In [None]:
# Analyze failure reasons
print("\n" + "="*80)
print("FAILURE REASON ANALYSIS")
print("="*80)

failed_df = df_results[~df_results['success']]
if len(failed_df) > 0 and 'failure_reason' in failed_df.columns:
    failure_counts = failed_df['failure_reason'].value_counts()
    print("\nFailure Reasons:")
    for reason, count in failure_counts.items():
        pct = 100 * count / len(failed_df)
        print(f"  {reason}: {count:,} ({pct:.1f}%)")
    
    # Analyze missing layer parameters
    if 'layers_missing_E' in failed_df.columns:
        print("\nMissing Layer Parameters (among failures):")
        print(f"  Average layers missing E: {failed_df['layers_missing_E'].mean():.2f}")
        print(f"  Average layers missing ν: {failed_df['layers_missing_nu'].mean():.2f}")
        print(f"  Average layers missing thickness: {failed_df['layers_missing_thickness'].mean():.2f}")

## 6. D11 Statistics by Individual Pathway

Compare D11 distributions across different calculation pathways.

In [None]:
print("="*80)
print("D11 STATISTICS BY PATHWAY")
print("="*80)

# Filter to successful calculations only
df_success = df_results[df_results['success'] == True].copy()

print(f"\nSuccessful D11 calculations: {len(df_success):,}")
print(f"Unique slabs with at least one successful pathway: {df_success['slab_id'].nunique():,}")
print(f"Unique pathways that succeeded: {df_success['pathway_description'].nunique()}")

if len(df_success) > 0:
    # Calculate statistics for each pathway
    pathway_D11_stats = df_success.groupby('pathway_description')['D11'].agg([
        'count',
        'mean',
        'median',
        'std',
        'min',
        'max'
    ]).reset_index()

    pathway_D11_stats.columns = ['pathway', 'n', 'mean_D11', 'median_D11', 'std_D11', 'min_D11', 'max_D11']
    pathway_D11_stats = pathway_D11_stats.sort_values('mean_D11', ascending=False)

    print("\nD11 Statistics by Pathway (N·mm) - Top 15:")
    print(pathway_D11_stats.head(15).to_string(index=False))
    
    # Save pathway statistics
    pathway_D11_stats.to_csv('D11_statistics_by_pathway_TEST.csv', index=False)
    print("\n✓ Pathway statistics saved to: D11_statistics_by_pathway_TEST.csv")
else:
    print("\n⚠️  No successful D11 calculations to analyze")

In [None]:
# Box plot comparison for top pathways by sample size
if len(df_success) > 0 and 'pathway_D11_stats' in locals():
    fig, ax = plt.subplots(figsize=(16, 10))

    # Get top pathways by sample size (limit to 15 for readability)
    n_pathways = min(15, len(pathway_D11_stats))
    top_pathways = pathway_D11_stats.nlargest(n_pathways, 'n')['pathway'].tolist()
    df_top = df_success[df_success['pathway_description'].isin(top_pathways)]

    # Create box plot
    bp = df_top.boxplot(column='D11', by='pathway_description', ax=ax, rot=90, patch_artist=True)
    ax.set_ylabel('D11 (N·mm)', fontsize=12)
    ax.set_xlabel('Pathway', fontsize=12)
    ax.set_title(f'D11 Distribution by Pathway (Top {n_pathways} by Sample Size) - TEST DATA', 
                 fontsize=14, fontweight='bold')
    plt.suptitle('')  # Remove automatic title
    plt.xticks(fontsize=8)
    plt.tight_layout()
    plt.savefig('D11_boxplot_by_pathway_TEST.png', dpi=300, bbox_inches='tight')
    plt.show()

    print("✓ Figure saved: D11_boxplot_by_pathway_TEST.png")
else:
    print("⚠️  Skipping box plot - insufficient data")

## 7. D11 Variability Across Pathways (Per Slab)

For each slab, analyze how D11 varies across different calculation pathways.

In [None]:
print("="*80)
print("D11 VARIABILITY ACROSS PATHWAYS (PER SLAB)")
print("="*80)

if len(df_success) > 0:
    # For each slab, calculate statistics across all successful pathways
    slab_D11_variability = df_success.groupby('slab_id')['D11'].agg([
        'count',  # Number of successful pathways
        'mean',   # Mean D11 across pathways
        'std',    # Std dev across pathways (variability)
        'min',    # Min D11
        'max',    # Max D11
    ]).reset_index()

    slab_D11_variability.columns = ['slab_id', 'n_pathways', 'mean_D11', 'std_D11', 'min_D11', 'max_D11']
    slab_D11_variability['range_D11'] = slab_D11_variability['max_D11'] - slab_D11_variability['min_D11']
    slab_D11_variability['cv_D11'] = slab_D11_variability['std_D11'] / slab_D11_variability['mean_D11']  # Coefficient of variation

    # Merge with slab properties
    slab_props = df_success[['slab_id', 'num_layers', 'slab_thickness_cm', 'slope_angle_deg']].drop_duplicates()
    slab_D11_variability = slab_D11_variability.merge(slab_props, on='slab_id')

    print(f"\nSlabs with successful D11 calculations: {len(slab_D11_variability):,}")
    print(f"Average successful pathways per slab: {slab_D11_variability['n_pathways'].mean():.1f}")

    print("\nD11 Variability Statistics:")
    print(slab_D11_variability[['mean_D11', 'std_D11', 'cv_D11', 'range_D11']].describe())
    
    # Save slab-level statistics
    slab_D11_variability.to_csv('D11_variability_by_slab_TEST.csv', index=False)
    print("\n✓ Slab variability statistics saved to: D11_variability_by_slab_TEST.csv")
else:
    print("\n⚠️  No successful D11 calculations to analyze")

In [None]:
# Histogram of coefficient of variation and other variability metrics
if len(df_success) > 0 and 'slab_D11_variability' in locals():
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))

    # CV histogram
    axes[0, 0].hist(slab_D11_variability['cv_D11'].dropna(), bins=30, edgecolor='black', alpha=0.7, color='steelblue')
    axes[0, 0].set_xlabel('Coefficient of Variation', fontsize=10)
    axes[0, 0].set_ylabel('Frequency', fontsize=10)
    axes[0, 0].set_title('D11 Coefficient of Variation Across Pathways', fontsize=11, fontweight='bold')
    axes[0, 0].axvline(slab_D11_variability['cv_D11'].median(), color='red', linestyle='--', 
                       label=f'Median: {slab_D11_variability["cv_D11"].median():.3f}')
    axes[0, 0].legend()

    # Std dev histogram
    axes[0, 1].hist(slab_D11_variability['std_D11'].dropna(), bins=30, edgecolor='black', alpha=0.7, color='coral')
    axes[0, 1].set_xlabel('Standard Deviation (N·mm)', fontsize=10)
    axes[0, 1].set_ylabel('Frequency', fontsize=10)
    axes[0, 1].set_title('D11 Standard Deviation Across Pathways', fontsize=11, fontweight='bold')

    # Range histogram
    axes[1, 0].hist(slab_D11_variability['range_D11'].dropna(), bins=30, edgecolor='black', alpha=0.7, color='mediumseagreen')
    axes[1, 0].set_xlabel('Range (N·mm)', fontsize=10)
    axes[1, 0].set_ylabel('Frequency', fontsize=10)
    axes[1, 0].set_title('D11 Range Across Pathways', fontsize=11, fontweight='bold')

    # Number of successful pathways
    axes[1, 1].hist(slab_D11_variability['n_pathways'], 
                    bins=range(int(slab_D11_variability['n_pathways'].min()), 
                              int(slab_D11_variability['n_pathways'].max())+2), 
                    edgecolor='black', alpha=0.7, color='mediumpurple')
    axes[1, 1].set_xlabel('Number of Successful Pathways', fontsize=10)
    axes[1, 1].set_ylabel('Frequency', fontsize=10)
    axes[1, 1].set_title('Successful Pathways per Slab', fontsize=11, fontweight='bold')

    plt.tight_layout()
    plt.savefig('D11_variability_distributions_TEST.png', dpi=300, bbox_inches='tight')
    plt.show()

    print("✓ Figure saved: D11_variability_distributions_TEST.png")
else:
    print("⚠️  Skipping variability plots - insufficient data")

## 8. Summary for Test Run

In [None]:
print("="*80)
print("TEST RUN SUMMARY")
print("="*80)

print(f"\n{'TEST CONFIGURATION:':<45}")
print(f"{'  Pits processed:':<45} {len(pits):,} / {len(all_caaml_files):,} ({100*len(pits)/len(all_caaml_files):.1f}%)")
print(f"{'  Slabs analyzed:':<45} {len(all_slabs):,}")
print(f"{'  Pathway executions:':<45} {len(df_results):,}")

print(f"\n{'EXECUTION PERFORMANCE:':<45}")
print(f"{'  Total time:':<45} {parse_time + slab_time + execution_time:.1f} seconds")
print(f"{'  Parsing time:':<45} {parse_time:.1f} seconds")
print(f"{'  Slab creation time:':<45} {slab_time:.1f} seconds")
print(f"{'  Pathway execution time:':<45} {execution_time:.1f} seconds")
print(f"{'  Time per slab:':<45} {execution_time/len(all_slabs):.2f} seconds")

print(f"\n{'RESULTS:':<45}")
print(f"{'  Successful D11 calculations:':<45} {df_success.shape[0] if len(df_success) > 0 else 0:,}")
print(f"{'  Overall success rate:':<45} {100 * df_results['success'].mean():.1f}%")

if len(df_success) > 0:
    print(f"{'  Slabs with successful pathways:':<45} {slab_D11_variability.shape[0] if 'slab_D11_variability' in locals() else 0:,}")
    print(f"{'  Unique successful pathways:':<45} {df_success['pathway_description'].nunique()}")
    print(f"\n{'  Mean D11:':<45} {df_success['D11'].mean():.1f} N·mm")
    print(f"{'  Median D11:':<45} {df_success['D11'].median():.1f} N·mm")
    print(f"{'  Std dev D11:':<45} {df_success['D11'].std():.1f} N·mm")
    
    if 'slab_D11_variability' in locals():
        print(f"\n{'  Mean pathway variability (CV):':<45} {slab_D11_variability['cv_D11'].mean():.3f}")
        print(f"{'  Median pathway variability (CV):':<45} {slab_D11_variability['cv_D11'].median():.3f}")

print("\n" + "="*80)
print("EXTRAPOLATION TO FULL DATASET")
print("="*80)

if len(all_slabs) > 0:
    # Estimate for full dataset
    scale_factor = len(all_caaml_files) / len(pits)
    estimated_total_slabs = int(len(all_slabs) * scale_factor)
    estimated_executions = estimated_total_slabs * len(all_D11_pathways)
    estimated_time = (execution_time / len(all_slabs)) * estimated_total_slabs

    print(f"\nBased on test results, for the full dataset we expect:")
    print(f"{'  Total slabs:':<45} ~{estimated_total_slabs:,}")
    print(f"{'  Total pathway executions:':<45} ~{estimated_executions:,}")
    print(f"{'  Estimated execution time:':<45} ~{estimated_time/60:.0f} minutes ({estimated_time/3600:.1f} hours)")
    print(f"\n  Note: Dynamic programming provides significant speedup.")
    print(f"  Actual time may be {execution_time/len(df_results):.3f} sec/pathway-execution")

print("\n" + "="*80)
print("EXPORTED FILES (TEST)")
print("="*80)
print("  1. D11_pathway_comparison_TEST_raw.csv")
if 'pathway_D11_stats' in locals():
    print("  2. D11_statistics_by_pathway_TEST.csv")
if 'slab_D11_variability' in locals():
    print("  3. D11_variability_by_slab_TEST.csv")
print("  4. D11_success_rates_by_pathway_TEST.png")
if 'pathway_D11_stats' in locals():
    print("  5. D11_boxplot_by_pathway_TEST.png")
if 'slab_D11_variability' in locals():
    print("  6. D11_variability_distributions_TEST.png")

print("\n" + "="*80)
print("✓ TEST RUN COMPLETE - Ready to process full dataset")
print("="*80)