### Version compatibility check

This notebook compares the xsnow package installed in your environment with the documentation version it was written for. The helper below calls `scripts/check_docs_version.py` so you can confirm that the package and docs align before continuing.


In [None]:
from __future__ import annotations

import subprocess
import sys
from pathlib import Path
import warnings


def _find_script() -> Path | None:
    current = Path.cwd().resolve()
    for candidate in [current, *current.parents]:
        script = candidate / "scripts" / "check_docs_version.py"
        if script.exists():
            return script
    return None


def get_docs_version() -> tuple[str | None, str | None]:
    script_path = _find_script()
    if script_path is None:
        return None, "scripts/check_docs_version.py was not found"
    try:
        completed = subprocess.run(
            [sys.executable, str(script_path)],
            check=True,
            capture_output=True,
            text=True,
        )
    except subprocess.CalledProcessError as exc:
        output = (exc.stdout or "") + (exc.stderr or "")
        return None, output.strip() or str(exc)
    return completed.stdout.strip() or None, None


docs_version, docs_error = get_docs_version()

try:
    import xsnow
    package_version = xsnow.__version__
except Exception as exc:  # pylint: disable=broad-except
    xsnow = None  # type: ignore[assignment]
    package_version = None
    package_error = str(exc)
else:
    package_error = None

print(f"xsnow package version: {package_version if package_version else 'not installed'}")
if package_error and not package_version:
    print(f"Import error: {package_error}")

if docs_version:
    print(f"xsnow docs version: {docs_version}")
else:
    message = "xsnow docs version: unavailable"
    if docs_error:
        message += f" ({docs_error})"
    print(message)

if docs_version and package_version and docs_version != package_version:
    warnings.warn(
        "xsnow package version differs from the documentation version. "
        "Consider aligning them before executing the notebook.",
        stacklevel=2,
    )

# 02: Basic Operations and Analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/02_basic_operations_and_analysis.ipynb)

Now that you understand xsnow's data structure, let's learn how to work with the data: selecting, filtering, and performing basic analyses.

## What You'll Learn

- Selecting data by location, time, and layer
- Filtering data with conditions
- Computing profile-level summaries
- Calculating snow water equivalent (SWE)
- Identifying weak layers
- Time series operations


## Installation (For Colab Users)

Set `INSTALL_XSNOW = True` in the next cell if you need to install xsnow. When enabled you can pick `INSTALL_METHOD = "pip"` to install published packages or `INSTALL_METHOD = "dev"` to work from a local clone. The cell also installs the supporting scientific Python stack used throughout the course.


In [None]:
import subprocess
import sys
from pathlib import Path

INSTALL_XSNOW = False  # Set to True to install or update xsnow in this environment.
INSTALL_METHOD = "pip"  # Choose "pip" for a package install, or "dev" for a developer clone.
DEV_REPO_URL = "https://gitlab.com/avacollabra/postprocessing/xsnow.git"
DEV_CLONE_DIR = Path.home() / "xsnow-dev"


def _run(cmd: list[str]) -> None:
    print(f"$ {' '.join(cmd)}")
    subprocess.check_call(cmd)


try:
    import xsnow
    print(f"xsnow {xsnow.__version__} is already available.")
except Exception as exc:  # pylint: disable=broad-except
    xsnow = None  # type: ignore[assignment]
    print(f"xsnow is not currently available: {exc}")
    if not INSTALL_XSNOW:
        print("Set INSTALL_XSNOW = True and re-run this cell to install xsnow (pip or dev clone).")
    else:
        try:
            if INSTALL_METHOD == "pip":
                _run([sys.executable, "-m", "pip", "install", "--quiet", "numpy", "pandas", "xarray", "matplotlib", "seaborn", "dask", "netcdf4"])
                _run([sys.executable, "-m", "pip", "install", "--quiet", "git+https://gitlab.com/avacollabra/postprocessing/xsnow"])
            elif INSTALL_METHOD == "dev":
                if not DEV_CLONE_DIR.exists():
                    _run(["git", "clone", DEV_REPO_URL, str(DEV_CLONE_DIR)])
                _run([sys.executable, "-m", "pip", "install", "--quiet", "-e", str(DEV_CLONE_DIR)])
            else:
                raise ValueError(f"Unsupported INSTALL_METHOD: {INSTALL_METHOD}")
        except subprocess.CalledProcessError as install_error:
            raise RuntimeError("xsnow installation command failed") from install_error
        import xsnow  # noqa: F401  # pylint: disable=import-outside-toplevel
        print(f"xsnow {xsnow.__version__} installed successfully.")
else:
    INSTALL_XSNOW = INSTALL_XSNOW  # no-op so variable is defined for later cells

## Setup: Load Sample Data

Let's load sample data to work with:


In [None]:
import xsnow
import numpy as np

# Load sample data
print("Loading sample data...")
try:
    ds = xsnow.single_profile_timeseries()
    print("✅ Data loaded!")
    print(f"Dimensions: {dict(ds.dims)}")
except Exception as e:
    print(f"❌ Error: {e}")
    print("Install: pip install git+https://gitlab.com/avacollabra/postprocessing/xsnow")
    ds = None


## Part 1: Selecting Data

You can select parts of your data in two ways:

### Selecting by Label with `.sel()`

Use `.sel()` when you know the label (like a date or location name):

**What you'll see:** The code below selects data for a specific location and time. The result is a smaller dataset containing only the selected data.


In [None]:
# Select a specific location
if len(ds.coords['location']) > 0:
    location_name = ds.coords['location'].values[0]
    ds_site = ds.sel(location=location_name)

# Select a specific time (or time range)
if len(ds.coords['time']) > 0:
    # Get first and last time
    times = ds.coords['time'].values
    
    # Select a single time
    ds_single_time = ds.sel(time=times[0])
    
    # Select a time range using slice
    ds_time_range = ds.sel(time=slice(times[0], times[9]))


### Selecting by Index with `.isel()`

Use `.isel()` when you want to select by position (first, second, etc.):

**What you'll see:** These examples show how to select the first time step, first few time steps, surface layer, and multiple layers. The resulting datasets will have fewer dimensions than the original.


In [None]:
# Get first time step
ds_first = ds.isel(time=0)

# Get first 5 time steps
ds_first5 = ds.isel(time=slice(0, 5))

# Get surface layer (layer 0)
surface = ds.isel(layer=0)

# Get multiple layers
top_layers = ds.isel(layer=[0, 1, 2])  # Top 3 layers


### Selecting by Depth (using 'z' coordinate)

Remember the `z` coordinate? We can use it to select layers by depth:

**What you'll see:** The code below filters for layers within 50 cm of the surface. The result will have fewer layers than the original dataset, containing only layers that meet the depth criteria.


In [None]:
# Select layers within 50 cm of surface (z between 0 and -0.5)
# Note: z is negative downward, so we want z >= -0.5
shallow = ds.where((ds.coords['z'] >= -0.5) & (ds.coords['z'] <= 0), drop=True)


## Part 2: Filtering Data with Conditions

You can filter data based on conditions (e.g., "show me only dense layers"). This is done with `.where()`:

**What you'll see:** The code below creates filtered datasets containing only layers that meet specific criteria (dense snow, cold temperatures). These filtered datasets will have fewer layers than the original.


In [None]:
# Find layers with density > 300 kg/m³ (dense snow)
dense_mask = ds['density'] > 300
dense_layers = ds.where(dense_mask, drop=True)


# Find cold layers (temperature < -10°C)
if 'temperature' in ds.data_vars:
    cold_mask = ds['temperature'] < -10
    cold_layers = ds.where(cold_mask, drop=True)


## Part 3: Computing Profile-Level Summaries

Often you want to summarize layer-level data into a single value per profile. For example, "what's the average density in each profile?"

**What you'll see:** The code below computes mean, max, and min density across all layers for each profile. The result is a new variable with the `layer` dimension removed - it's now a profile-level variable (one value per profile per time step).


In [None]:
# Compute mean density across all layers for each profile
mean_density = ds['density'].mean(dim='layer')

# Other useful aggregations
max_density = ds['density'].max(dim='layer')  # Maximum density in profile
min_density = ds['density'].min(dim='layer')   # Minimum density in profile
std_density = ds['density'].std(dim='layer')   # Standard deviation


# Add as new variable to dataset
ds = ds.assign(mean_density=mean_density)


## Part 4: Calculating Snow Water Equivalent (SWE)

**Snow Water Equivalent (SWE)** is a critical metric: it tells you how much water is in the snowpack. SWE = density × thickness, summed over all layers.

### Understanding SWE

- **Why it matters**: SWE tells you the actual water content, not just snow depth
- **Units**: Typically mm or m of water
- **Calculation**: For each layer: density (kg/m³) × thickness (m) = water equivalent (kg/m² or mm)

Let's compute it:


In [None]:
# Method 1: If we have layer thickness directly
if 'thickness' in ds.data_vars:
    # SWE = sum(density * thickness) over all layers
    swe = (ds['density'] * ds['thickness']).sum(dim='layer') / 1000.0  # Convert to m
    print(f"SWE calculated from thickness: {swe.values[0]:.3f} m (first profile)")
    ds = ds.assign(SWE=swe)

# Method 2: Compute thickness from 'z' coordinate (depth)
elif 'z' in ds.coords and 'density' in ds.data_vars:
    # Layer thickness = difference in z between adjacent layers
    # For each layer, thickness = |z[i] - z[i+1]|, except last layer
    z = ds.coords['z']
    
    # Compute thickness by differencing z (absolute value since z is negative)
    z_diff = z.diff(dim='layer', label='upper')
    thickness = -z_diff  # Negative because z decreases downward
    
    # For the last layer, estimate thickness from remaining depth
    # This is a simplified approach - in practice, you'd need more sophisticated handling
    if 'HS' in ds.data_vars:
        # Use HS to estimate remaining thickness for last layer
        total_thickness = -z.isel(layer=0)  # Depth of first layer
        computed_thickness = thickness.sum(dim='layer', skipna=True)
        last_layer_thickness = total_thickness - computed_thickness
        
        # Fill NaN values in thickness with estimated last layer thickness
        thickness = thickness.fillna(last_layer_thickness)
        
        # Calculate SWE
        swe = (ds['density'] * thickness).sum(dim='layer') / 1000.0  # Convert to m
        print(f"SWE calculated from z coordinate: {swe.values[0]:.3f} m (first profile)")
        ds = ds.assign(SWE=swe)
    else:
        print("Note: HS not available, using simplified thickness calculation")
        swe = (ds['density'] * thickness.fillna(0)).sum(dim='layer') / 1000.0
        ds = ds.assign(SWE=swe)

# Method 3: If SWE is already in the dataset
if 'SWE' in ds.data_vars or 'swe' in ds.data_vars:
    swe_var = ds['SWE'] if 'SWE' in ds.data_vars else ds['swe']
    print(f"SWE found in dataset: {swe_var.values[0]:.3f} m (first profile)")


## Part 5: Identifying Weak Layers

Weak layers are critical for avalanche forecasting. A weak layer is typically characterized by:
- Low density
- Specific grain types (e.g., faceted crystals)
- Low strength/hardness

Let's create a simple weak layer identifier:


In [None]:
# Define weak layer criteria
# This is a simplified example - real weak layer identification is more complex

weak_mask = None

if 'density' in ds.data_vars:
    # Weak layers often have low density (< 150 kg/m³)
    low_density = ds['density'] < 150
    
    if 'grain_type' in ds.data_vars:
        # Some grain types indicate weak layers (this depends on your grain type coding)
        # Example: grain_type == 4 might indicate faceted crystals
        # Note: Check your data's grain type coding scheme!
        # weak_grain = ds['grain_type'] == 4  # Example only
        weak_mask = low_density  # Simplified: just use density for now


## Part 6: Time Series Operations

xsnow makes it easy to analyze how snowpack properties change over time.

### Time-based Aggregations


In [None]:
# Compute mean over time (average profile)
if 'density' in ds.data_vars:
    mean_density_over_time = ds['density'].mean(dim='time')

# Compute time series of profile-level variables
if 'HS' in ds.data_vars:
    hs_series = ds['HS'].isel(location=0, slope=0, realization=0)

# Compute change over time (difference between consecutive time steps)
if 'HS' in ds.data_vars:
    hs_change = ds['HS'].diff(dim='time')


### Accessing Values as NumPy Arrays

Sometimes you need raw NumPy arrays for integration with other libraries:


In [None]:
# Get values as NumPy array
density_array = ds['density'].values

# Get values for a specific selection
surface_density = ds['density'].sel(layer=0).values


## Summary

✅ **What we learned:**

1. **Selecting data**: `.sel()` for labels, `.isel()` for positions
2. **Filtering**: `.where()` with conditions to find specific layers/profiles
3. **Aggregations**: `.mean()`, `.max()`, `.min()`, `.sum()` across dimensions
4. **SWE calculation**: Sum of density × thickness over layers
5. **Weak layer identification**: Using conditions to find problematic layers
6. **Time series**: Analyzing changes over time with `.diff()`, `.mean()`, etc.
7. **NumPy integration**: `.values` to get raw arrays when needed

## Key Operations Cheat Sheet

```python
# Selection
ds.sel(location="VIR1A", time="2024-02-01")  # By label
ds.isel(time=0, layer=0)                      # By position

# Filtering
ds.where(ds['density'] > 300, drop=True)      # Condition

# Aggregations
ds['density'].mean(dim='layer')               # Mean over layers
ds['HS'].max(dim='time')                      # Max over time

# Time operations
ds['HS'].diff(dim='time')                     # Change over time
ds.resample(time='1D').mean()                 # Resample

# NumPy
ds['density'].values                          # Get array
```

## Next Steps

Ready to visualize your data? Move on to:
- **03_visualization.ipynb**: Create plots and visualizations

## Exercises

1. Select data for a specific date and print the density profile
2. Find all layers with temperature below -5°C
3. Compute the mean density for the top 3 layers across all time steps
4. Calculate how much the snow height changed between the first and last time step
5. Identify profiles that have at least one layer with density < 100 kg/m³
