# 08a: Custom Labeling Playbook

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/08a_labeled_data_custom_labels.ipynb)

Build advanced tagging strategies that combine meteorological context, snowpack structure, and historical observations.


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.


In [None]:
%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow


In [None]:
import pandas as pd
import numpy as np
import xarray as xr
import seaborn as sns
import xsnow

sns.set(style='whitegrid', context='talk')


In [None]:
        print("Loading xsnow sample data...")
        try:
            ds = xsnow.single_profile_timeseries()
            base_ds = getattr(ds, 'data', ds)
            print("✅ Data loaded successfully!")
        except Exception as exc:
            print(f"❌ Error loading sample data: {exc}")
            print("
Make sure xsnow is properly installed:
  pip install git+https://gitlab.com/avacollabra/postprocessing/xsnow")
            base_ds = None


## Step 1: Define a Label Schema

Capture your organizational rules in a structured schema for reuse.


In [None]:
layer_schema = [
    {
        'name': 'storm_slab',
        'criteria': {
            'new_snow': {'threshold_cm': 30},
            'wind_speed': {'threshold_ms': 8}
        },
        'color': '#ef8a62'
    },
    {
        'name': 'persistent_weak_layer',
        'criteria': {
            'temperature_gradient': {'min': 8},
            'grain_type': {'values': ['FC', 'DH']}
        },
        'color': '#67a9cf'
    },
]
layer_schema


## Step 2: Compute Derived Inputs

Calculate the numeric indicators required by the schema.


In [None]:
derived_inputs = {}
if base_ds is not None:
    if 'temperature' in base_ds and 'z' in base_ds.coords:
        grad = base_ds['temperature'].diff('layer') / base_ds['z'].diff('layer')
        derived_inputs['temperature_gradient'] = grad.pad(layer=(0, 1), mode='edge')
    if 'new_snow' in base_ds:
        derived_inputs['new_snow'] = base_ds['new_snow']
    if 'wind_speed' in base_ds:
        derived_inputs['wind_speed'] = base_ds['wind_speed']
    if 'grain_type' in base_ds:
        derived_inputs['grain_type'] = base_ds['grain_type']
    derived_inputs.keys()
else:
    print('Load sample data before computing derived inputs.')


## Step 3: Apply Schema Rules

Iterate through each rule and generate boolean masks.


In [None]:
if base_ds is not None:
    base_template = base_ds['density'] if 'density' in base_ds else list(base_ds.data_vars.values())[0]
    mask_shape = base_template.shape
    masks = {}
    for rule in layer_schema:
        mask_array = xr.DataArray(
            np.zeros(mask_shape, dtype=bool),
            coords=base_template.coords,
            dims=base_template.dims
        )
        if 'new_snow' in derived_inputs:
            mask_array = mask_array | (derived_inputs.get('new_snow') >= rule['criteria'].get('new_snow', {}).get('threshold_cm', np.inf))
        if 'wind_speed' in derived_inputs:
            mask_array = mask_array | (derived_inputs.get('wind_speed') >= rule['criteria'].get('wind_speed', {}).get('threshold_ms', np.inf))
        if 'temperature_gradient' in derived_inputs:
            mask_array = mask_array | (derived_inputs['temperature_gradient'] >= rule['criteria'].get('temperature_gradient', {}).get('min', np.inf))
        if 'grain_type' in derived_inputs:
            mask_array = mask_array | derived_inputs['grain_type'].isin(rule['criteria'].get('grain_type', {}).get('values', []))
        masks[rule['name']] = mask_array
else:
    masks = {}
masks


## Step 4: Assemble a Label Dataset

Combine the boolean masks into an xarray Dataset with metadata.


In [None]:
if base_ds is not None and masks:
    label_ds = xr.Dataset({name: mask.astype('int8') for name, mask in masks.items()})
    label_ds.attrs['schema'] = layer_schema
    label_ds


## Step 5: Persist and Share

Export the label dataset for downstream visualization tools.


In [None]:
if 'label_ds' in locals():
    path = 'custom_labels.nc'
    label_ds.to_netcdf(path)
    print(f"Saved label dataset to {path}.")


## Summary

- Encode business rules as data to keep analyses reproducible.
- Derive necessary inputs (gradients, thresholds) before labeling.
- Store labels with metadata for consistency across teams.
