# All Density Calculation Pathways

This notebook analyzes snow density calculation methods at both **layer-level** and **slab-level** scales.

## Table of Contents

1. [Load Snow Pit Data](#1-load-snow-pit-data)
2. [Find All Density Calculation Pathways](#2-find-all-density-calculation-pathways)
3. [Layer-Level Analysis](#3-layer-level-analysis)
4. [Slab-Level Comparison (ECTP)](#4-slab-level-comparison-ectp)

**Target Parameter**: `density` — snow layer density in kg/m³

Uncertainty reflects propagated input measurement uncertainties only (method regression standard error excluded): ±10% for direct density measurement, ±0.67 hand hardness index, ±0.5 mm grain size.

In [1]:
from pathlib import Path
from typing import Dict, Any
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd

from snowpyt_mechparams.snowpilot import parse_caaml_directory
from snowpyt_mechparams.data_structures import Pit, Slab
from snowpyt_mechparams.graph import graph, density
from snowpyt_mechparams.algorithm import find_parameterizations
from snowpyt_mechparams.execution import ExecutionEngine
from snowpyt_mechparams.execution.config import ExecutionConfig

## 1. Load Snow Pit Data

In [2]:
snow_pits_raw = parse_caaml_directory(str(Path("data")))
pits = [Pit.from_snow_pit(sp) for sp in snow_pits_raw]

print(f"Loaded {len(pits)} snow pits ({sum(len(pit.layers) for pit in pits)} layers)")

Loaded 50278 snow pits (371429 layers)


## 2. Find All Density Calculation Pathways

In [3]:
pathways = find_parameterizations(graph, graph.get_node("density"))

print(f"Found {len(pathways)} pathways for calculating density:\n")
for i, pathway in enumerate(pathways, 1):
    print(f"Pathway {i}:")
    print(pathway)
    print()

Found 4 pathways for calculating density:

Pathway 1:
branch 1: snow_pit -- data_flow --> measured_density -- data_flow --> density

Pathway 2:
branch 1: snow_pit -- data_flow --> measured_hand_hardness -- data_flow --> merge_hand_hardness_grain_form
branch 2: snow_pit -- data_flow --> measured_grain_form -- data_flow --> merge_hand_hardness_grain_form
merge branch 1, branch 2: merge_hand_hardness_grain_form -- geldsetzer --> density

Pathway 3:
branch 1: snow_pit -- data_flow --> measured_hand_hardness -- data_flow --> merge_hand_hardness_grain_form
branch 2: snow_pit -- data_flow --> measured_grain_form -- data_flow --> merge_hand_hardness_grain_form
merge branch 1, branch 2: merge_hand_hardness_grain_form -- kim_jamieson_table2 --> density

Pathway 4:
branch 1: snow_pit -- data_flow --> measured_hand_hardness -- data_flow --> merge_hand_hardness_grain_form
branch 2: snow_pit -- data_flow --> measured_grain_form -- data_flow --> merge_hand_hardness_grain_form
branch 3: snow_pit -- da

## 3. Layer-Level Analysis

Each layer is analyzed independently as a single-layer slab, regardless of its parent pit.

In [4]:
engine = ExecutionEngine(graph)
config = ExecutionConfig(include_method_uncertainty=False)

# Build flat list of (layer, slope_angle, pit_id, layer_index)
layer_infos = []
for pit in pits:
    try:
        angle = float(pit.slope_angle) if pit.slope_angle is not None and not np.isnan(pit.slope_angle) else 0.0
    except (TypeError, ValueError):
        angle = 0.0
    for idx, layer in enumerate(pit.layers):
        layer_infos.append((layer, angle, pit.pit_id, idx))

# Execute all pathways on each layer as a single-layer slab
all_results: Dict[str, Any] = {}
for layer, angle, pit_id, layer_idx in layer_infos:
    slab = Slab(layers=[layer], angle=angle, pit_id=pit_id)
    results = engine.execute_all(slab, "density", config=config)
    all_results[f"{pit_id}_L{layer_idx}"] = {
        'execution_results': results,
        'pit_id': pit_id,
    }

print(f"Executed {len(pathways)} pathways on {len(layer_infos)} layers")

Executed 4 pathways on 371429 layers


In [5]:
density_data = []
for layer_id, info in all_results.items():
    for pathway_desc, pathway_result in info['execution_results'].pathways.items():
        for trace in pathway_result.computation_trace:
            if trace.parameter == "density" and trace.success and trace.output is not None:
                out = trace.output
                if hasattr(out, 'nominal_value'):
                    val, std = out.nominal_value, out.std_dev
                else:
                    try:
                        val, std = float(out), 0.0
                    except (TypeError, ValueError):
                        continue
                density_data.append({
                    'layer_id': layer_id,
                    'pit_id': info['pit_id'],
                    'method': pathway_result.methods_used.get('density', 'unknown'),
                    'density': val,
                    'density_std': std,
                })

df_density = pd.DataFrame(density_data)
df_density['rel_unc'] = np.where(df_density['density'] != 0, df_density['density_std'] / df_density['density'], np.nan)

total_layers = len(layer_infos)

summary = (
    df_density.groupby('method')
    .agg(layers=('layer_id', 'nunique'), avg_val=('density', 'mean'), avg_rel_unc=('rel_unc', 'mean'))
    .sort_values('layers', ascending=False)
    .reset_index()
)

print(f"  {'Method':<30s} {'Layers':>30s} {'Avg Value':>16s} {'Avg Rel. Uncertainty':>22s}")
print(f"  {'-'*100}")
for _, row in summary.iterrows():
    n = int(row['layers'])
    pct = n / total_layers
    count_str = f"{n} / {total_layers} ({pct:.1%})"
    val_str = f"{row['avg_val']:.0f} kg/m³"
    print(f"  {row['method']:<30s} {count_str:>30s}    {val_str:>14s}    {row['avg_rel_unc']:>18.1%}")
print()
print("  Note: Uncertainty is propagated from input measurement uncertainties only.")


  Method                                                 Layers        Avg Value   Avg Rel. Uncertainty
  ----------------------------------------------------------------------------------------------------
  kim_jamieson_table2                   235522 / 371429 (63.4%)         208 kg/m³                 16.4%
  geldsetzer                            200676 / 371429 (54.0%)         193 kg/m³                 17.1%
  kim_jamieson_table5                   104397 / 371429 (28.1%)         198 kg/m³                 18.4%
  data_flow                               10468 / 371429 (2.8%)         283 kg/m³                 10.0%

  Note: Uncertainty is propagated from input measurement uncertainties only.


## 4. Slab-Level Comparison (ECTP)

In [6]:
# Create ECTP slabs
ectp_slabs = []
for pit in pits:
    for slab in pit.create_slabs(weak_layer_def="ECTP_failure_layer"):
        ectp_slabs.append({'slab': slab, 'n_layers': len(slab.layers)})

print(f"Created {len(ectp_slabs)} ECTP slabs")

Created 14776 ECTP slabs


In [7]:
# Execute all density pathways on each ECTP slab and count successes
# A slab succeeds for a pathway if ALL its layers have successful density calculations
pathway_slab_success: Dict[str, int] = {}

for info in ectp_slabs:
    slab = info['slab']
    n = info['n_layers']
    results = engine.execute_all(slab, "density", config=config)
    for pathway_result in results.pathways.values():
        method = pathway_result.methods_used.get('density', 'unknown')
        n_ok = sum(
            1 for t in pathway_result.computation_trace
            if t.parameter == "density" and t.success and t.output is not None
        )
        pathway_slab_success[method] = pathway_slab_success.get(method, 0) + (1 if n_ok == n else 0)

### Layer-Level vs Slab-Level Comparison

In [8]:
all_methods = sorted(
    set(df_density['method'].unique()) | set(pathway_slab_success.keys()),
    key=lambda m: df_density[df_density['method'] == m]['layer_id'].nunique() if m in df_density['method'].values else 0,
    reverse=True,
)

total_slabs = len(ectp_slabs)

print(f"  {'Method':<30s} {'Layers':>30s} {'Slabs (ECTP)':>30s}")
print(f"  {'-'*92}")
for method in all_methods:
    layer_n = df_density[df_density['method'] == method]['layer_id'].nunique() if method in df_density['method'].values else 0
    layer_str = f"{layer_n} / {total_layers} ({layer_n / total_layers:.1%})"
    slab_n = pathway_slab_success.get(method, 0)
    slab_str = f"{slab_n} / {total_slabs} ({slab_n / total_slabs:.1%})"
    print(f"  {method:<30s} {layer_str:>30s}    {slab_str:>30s}")

print()
print("  Slab success requires ALL layers in the slab to have successful density calculations.")


  Method                                                 Layers                   Slabs (ECTP)
  --------------------------------------------------------------------------------------------
  kim_jamieson_table2                   235522 / 371429 (63.4%)              5951 / 14776 (40.3%)
  geldsetzer                            200676 / 371429 (54.0%)              4539 / 14776 (30.7%)
  kim_jamieson_table5                   104397 / 371429 (28.1%)               1145 / 14776 (7.7%)
  data_flow                               10468 / 371429 (2.8%)                109 / 14776 (0.7%)

  Slab success requires ALL layers in the slab to have successful density calculations.


## 5. Density Distribution by Pathway

Side-by-side violin plots comparing the density distributions produced by each calculation method.
Violins are ordered left-to-right by layer coverage (highest to lowest).

In [9]:
import plotly.graph_objects as go

# Order methods by layer coverage (descending), matching the summary table
method_order = summary['method'].tolist()

METHOD_COLORS = {
    'kim_jamieson_table2': 'rgba( 68, 114, 196, 0.80)',
    'geldsetzer':          'rgba( 84, 168, 104, 0.80)',
    'kim_jamieson_table5': 'rgba(148, 103, 189, 0.80)',
    'data_flow':           'rgba(196, 140,  68, 0.80)',
}

fig = go.Figure()

for method in method_order:
    vals = df_density.loc[df_density['method'] == method, 'density'].values
    row  = summary[summary['method'] == method].iloc[0]
    n    = int(row['layers'])
    pct  = n / total_layers

    fig.add_trace(go.Violin(
        y=vals,
        name=f"<b>{method}</b><br><sup>{n:,} layers ({pct:.1%})</sup>",
        box_visible=True,
        meanline_visible=True,
        points=False,
        fillcolor=METHOD_COLORS.get(method, 'rgba(128,128,128,0.6)'),
        line_color=METHOD_COLORS.get(method, 'rgba(128,128,128,0.6)').replace('0.80', '1.0'),
        opacity=0.85,
    ))

fig.update_layout(
    title=dict(
        text=(
            '<b>Snow Density Distribution by Calculation Method</b><br>'
            '<sup>Layer-level calculations — violin width ∝ density of values, '
            'box shows IQR, line shows mean</sup>'
        ),
        x=0.5, xanchor='center', font=dict(size=13),
    ),
    yaxis=dict(
        title='Density (kg/m³)',
        range=[0, 600],
        gridcolor='rgba(200,200,200,0.4)',
        zeroline=False,
    ),
    xaxis=dict(title='Calculation Method'),
    showlegend=False,
    width=820,
    height=520,
    plot_bgcolor='white',
    margin=dict(l=60, r=30, t=100, b=60),
)

fig.show()