# 08: Working with Labeled Snowpack Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/08_labeled_data.ipynb)

Snowpack analysts rely on categorical descriptors—grain types, layers, weak interfaces—to understand hazard evolution. This notebook demonstrates how to manage, enrich, and visualize labeled data in xsnow.

## What You'll Learn

- Discovering categorical variables in xsnow datasets
- Creating derived layer labels from numeric thresholds
- Summarizing labeled data with pandas-style tables
- Visualizing labeled layers through heatmaps and stacked charts
- Exporting label metadata for downstream tools


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.


In [None]:
%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow


In [None]:
import pandas as pd
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import xsnow

sns.set(style='whitegrid', context='talk')


In [None]:
        print("Loading xsnow sample data...")
        print("Using xsnow.single_profile_timeseries()")
        print()

        try:
            ds = xsnow.single_profile_timeseries()
            base_ds = getattr(ds, 'data', ds)
            print("✅ Data loaded successfully!")
            print(base_ds)
        except Exception as exc:
            print(f"❌ Error loading sample data: {exc}")
            print("
Make sure xsnow is properly installed:")
            print("  pip install git+https://gitlab.com/avacollabra/postprocessing/xsnow")
            ds = None
            base_ds = None


## Part 1: Inspect Available Labels

Identify the variables that encode categorical or string-based information in the dataset.


In [None]:
if base_ds is not None:
    label_vars = []
    for name, data_array in base_ds.data_vars.items():
        if str(data_array.dtype).startswith('str') or data_array.dtype == object or 'label' in name.lower() or 'type' in name.lower():
            label_vars.append(name)
    if label_vars:
        print("Potential label variables:")
        for var in label_vars:
            attrs = base_ds[var].attrs
            description = attrs.get('long_name', attrs.get('description', ''))
            print(f"  - {var}: {description}")
    else:
        print("No obvious string label variables detected; consider generating labels from numeric fields.")
else:
    print("Dataset not loaded. Restart the notebook and ensure xsnow is installed.")


## Part 2: Create Derived Layer Labels

Convert numeric measurements into categorical classes (e.g., hardness regimes or critical temperature gradients).


In [None]:
if base_ds is not None and 'temperature' in base_ds and 'density' in base_ds:
    temperature = base_ds['temperature']
    density = base_ds['density']
    layer_class = xr.DataArray(
        np.full(temperature.shape, 'normal pack', dtype=object),
        coords=temperature.coords,
        dims=temperature.dims,
        name='layer_class'
    )
    layer_class = layer_class.where(~((temperature > -2) & (density < 200)), 'near-melting low density')
    layer_class = layer_class.where(~((temperature < -8) & (density < 250)), 'potential faceting')
    layer_class = layer_class.where(~(density > 350), 'dense slab')
    layer_class.attrs['description'] = 'Heuristic layer classification based on temperature and density'
    classified_ds = base_ds.assign(layer_class=layer_class)
    print(classified_ds)
else:
    print('Temperature and density variables are required for this example.')


## Part 3: Summaries and Frequency Tables

Convert labeled data into tidy tables for reporting.


In [None]:
if base_ds is not None:
    source = 'classified_ds' if 'classified_ds' in locals() else 'base_ds'
    dataset = locals().get(source, base_ds)
    if 'layer_class' in dataset.data_vars:
        df = dataset['layer_class'].to_dataframe().reset_index()
        counts = df.groupby(['time', 'layer_class']).size().unstack(fill_value=0)
        display(counts.head())
    else:
        print("No 'layer_class' variable present. Create derived labels in the previous step.")


## Part 4: Visualize Labeled Layers

Use seaborn heatmaps to visualize how label categories evolve through time and depth.


In [None]:
import xarray as xr  # Local import to avoid issues if xr wasn't imported earlier

if base_ds is not None:
    dataset = locals().get('classified_ds', base_ds)
    if 'layer_class' in dataset.data_vars and 'z' in dataset.coords:
        sample = dataset.isel(location=0, slope=0, realization=0)
        pivot = sample['layer_class'].to_pandas()
        plt.figure(figsize=(12, 6))
        sns.heatmap(pivot.apply(lambda col: pd.Categorical(col).codes), cmap='Spectral', cbar=False)
        plt.title('Layer Class Codes (categorical heatmap)')
        plt.xlabel('Layer')
        plt.ylabel('Time index')
        plt.show()
    else:
        print("Need 'layer_class' variable and 'z' coordinate to plot the heatmap.")


## Part 5: Export Label Metadata

Package label descriptions and color palettes for other applications (e.g., dashboards).


In [None]:
if base_ds is not None:
    dataset = locals().get('classified_ds', base_ds)
    if 'layer_class' in dataset.data_vars:
        palette = {
            'near-melting low density': '#fbb4ae',
            'potential faceting': '#b3cde3',
            'dense slab': '#ccebc5',
            'normal pack': '#decbe4',
            'unknown': '#f2f2f2'
        }
        metadata = {
            'variable': 'layer_class',
            'description': dataset['layer_class'].attrs.get('description', ''),
            'palette': palette,
        }
        print(metadata)
    else:
        print("Generate layer_class before exporting metadata.")


## Summary

- xsnow datasets often include categorical descriptors such as grain types.
- You can build additional labels from numeric thresholds for your workflow.
- Summaries and visualizations help communicate label distributions effectively.

**Next steps:** Dive deeper into custom tagging strategies in `08a_labeled_data_custom_labels.ipynb`.
