# 01: Introduction to xsnow and Loading Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/01_introduction_and_loading_data.ipynb)

This notebook introduces you to xsnow, the Python libraries it builds on, and how to load snowpack data.

## What You'll Learn

- What xsnow is and why it's useful
- Python fundamentals: NumPy, pandas, and xarray basics
- Understanding xsnow's 5-dimensional data model
- How to load .pro and .smet files
- Exploring dataset structure and metadata


### Learning objectives
- Understand how xsnow represents snowpack data across dimensions.
- Practice NumPy, pandas, and xarray operations that underpin xsnow workflows.
- Load the sample xsnow dataset and interpret its coordinates and variables.
- Inspect metadata to connect measurements to physical meaning.

**Prerequisites**
- [ ] Ability to run Python code cells in a notebook.
- [ ] Comfort with basic Python data structures (lists, dictionaries).
- [ ] Prior exposure to NumPy or pandas fundamentals.


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.

In [None]:
# Run.

%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow



## Part 1: What is xsnow?

**xsnow** is a Python library designed to make working with snowpack simulation data efficient and intuitive. It's built specifically for data from the SNOWPACK model (and other snow models), which outputs detailed information about snow layers over time.

### Why xsnow?

- **Handles complex file formats**: SNOWPACK outputs come in specialized formats (.pro, .smet) that xsnow can parse automatically
- **Organized data structure**: Instead of juggling hundreds of separate files, xsnow organizes everything into a single, coherent dataset
- **Powerful analysis tools**: Built-in functions for common snowpack analyses (SWE, weak layers, stability indices)
- **Built on proven libraries**: Uses xarray, NumPy, and pandas under the hood, so you get their full power

### The Big Picture

Think of xsnow as a translator: it takes raw SNOWPACK output files and converts them into a format that's easy to work with in Python. Instead of manually parsing text files, you get a clean, organized dataset where you can ask questions like "Show me all weak layers on north-facing slopes after February 1st" with simple code.


## Part 2: Python Fundamentals for xsnow
We'll use a show → run → explain rhythm so each library demo stays focused.


In [None]:
# Run.
import numpy as np

# Create a simple array
temperatures = np.array([-5, -3, -1, 0, -2])

# Arrays can be multi-dimensional
# Imagine 3 layers, each with a temperature
layer_temps = np.array([[-5, -3, -1],  # Layer 0 (surface)
                        [-3, -2, -1],  # Layer 1
                        [-2, -1, 0]])  # Layer 2


**Explain.** NumPy arrays let us reason about entire snow layers at once while preserving numeric efficiency.


In [None]:
# Check for understanding: confirm array shapes
assert temperatures.shape == (5,)
assert layer_temps.shape == (3, 3)


### Pandas: Working with Tables
**Show.** We'll assemble a tiny table of snow depth observations to see DataFrame basics.


In [None]:
# Run.
import pandas as pd

# Create a simple table (DataFrame)
data = {
    'time': ['2024-01-01', '2024-01-02', '2024-01-03'],
    'snow_depth': [50, 55, 60],  # cm
    'temperature': [-5, -3, -1]   # °C
}
df = pd.DataFrame(data)


**Explain.** A pandas DataFrame keeps measurements aligned by column so we can sort, filter, and visualize with ease.


In [None]:
# Check for understanding: DataFrame columns and units
assert list(df.columns) == ['time', 'snow_depth', 'temperature']
assert df['snow_depth'].max() == 60


### XArray: Multi-Dimensional Labeled Arrays
**Show.** We'll label a 3D temperature block to preview xsnow's multi-indexed style.


In [None]:
# Run.
import xarray as xr

# Create a simple xarray DataArray (like a NumPy array with labels)
# Let's say we have temperature data for 2 locations, 3 time steps, 2 layers
temps = np.array([[[-5, -3],   # Location 0, Time 0, Layers [0, 1]
                   [-4, -2],   # Location 0, Time 1
                   [-3, -1]],  # Location 0, Time 2
                  [[-6, -4],   # Location 1, Time 0
                   [-5, -3],   # Location 1, Time 1
                   [-4, -2]]]) # Location 1, Time 2

# Create labeled dimensions
da = xr.DataArray(
    temps,
    dims=['location', 'time', 'layer'],
    coords={
        'location': ['Station_A', 'Station_B'],
        'time': pd.date_range('2024-01-01', periods=3, freq='D'),
        'layer': [0, 1]  # Layer 0 = surface, Layer 1 = deeper
    },
    name='temperature'
)



**Explain.** xarray wraps the NumPy array with coordinate labels so xsnow can align measurements across stations, time, and layers.


In [None]:
# Check for understanding: labeled access
assert da.sel(location='Station_A', layer=0).shape == (3,)
assert 'time' in da.coords


**Key XArray Concepts:**
- **Dimensions**: The axes of your data (location, time, layer)
- **Coordinates**: Labels for each dimension (e.g., station names, dates)
- **DataArray**: A single variable with dimensions (like our temperature above)
- **Dataset**: A collection of DataArrays that share dimensions

xsnow uses xarray's Dataset structure to organize all snowpack variables together!


## Part 3: Understanding xsnow's Data Model

xsnow organizes snowpack data using **5 key dimensions**. This might sound complex, but it's actually very logical once you understand it.

### The 5 Dimensions

1. **location**: The site or grid point (e.g., "VIR1A", "Station_1")
2. **time**: When the profile was measured/simulated
3. **slope**: Different slope aspects at the same location (north-facing, south-facing, etc.)
4. **realization**: Different model runs or scenarios (for ensemble runs)
5. **layer**: The vertical layers within the snowpack (layer 0 = surface, higher numbers = deeper)

### Why This Structure?

This structure allows you to ask powerful questions:
- "Show me density profiles for all locations on February 1st"
- "Compare north vs south-facing slopes"
- "Find weak layers across all time steps"
- All without writing loops!

### Profile-level vs Layer-level Variables

- **Profile-level**: Properties of the entire snowpack (e.g., total snow height HS). These don't vary by layer.
- **Layer-level**: Properties of individual layers (e.g., density, temperature). These vary by layer.

Let's see this in action once we load some data!


## Part 4: Loading Data with xsnow
**Show.** We'll call `xsnow.single_profile_timeseries()` to grab a tiny bundled dataset for experimentation.


In [None]:
# Run.
import xsnow

# Load xsnow's lightweight sample time series dataset
# This returns an xsnowDataset with a time series of snow profiles
print("Loading xsnow sample data...")
print("Using xsnow.single_profile_timeseries() - a lightweight time series dataset")
print()

try:
    ds = xsnow.single_profile_timeseries()
    print("✅ Data loaded successfully!")
    print("\nDataset summary:")
    print(ds)
    print("\nDataset dimensions:")
    print(f"  {dict(ds.dims)}")
    print("\nDataset coordinates:")
    for coord in ds.coords:
        print(f"  {coord}: {ds.coords[coord].shape}")
    print("\nDataset variables:")
    for var in ds.data_vars:
        print(f"  {var}: {ds[var].dims}")
except Exception as e:
    print(f"❌ Error loading sample data: {e}")
    print("\nMake sure xsnow is properly installed:")
    print("  pip install git+https://gitlab.com/avacollabra/postprocessing/xsnow")
    ds = None


**Explain.** The helper returns an `xsnow.Dataset` preloaded with snow profile timelines so you can explore without hunting for files.


In [None]:
# Check for understanding: dataset availability
assert ds is not None, 'Re-run the install cell if the sample dataset failed to load.'
assert 'time' in ds.dims


### Understanding the Dataset Structure
**Show.** Let's preview the core dimensions, coordinates, and variables that xsnow tracks.


In [None]:
# Run.
if ds is None:
    raise RuntimeError('Load the dataset above before inspecting it.')

print('Dimensions:', dict(ds.dims))
print('Coordinates:', list(ds.coords))
print('Variables:', list(ds.data_vars))

**Explain.** The printed summary confirms xsnow tracks layers (`layer`/`z`), time, and measurement variables side by side.


In [None]:
# Check for understanding: dimensionality
assert 'layer' in ds.dims
assert len(ds.data_vars) > 0


### Inspecting Specific Variables
**Show.** We'll peek at layer-based and profile-based variables to see how xsnow differentiates them.


In [None]:
# Run.
if ds is None:
    raise RuntimeError('Load the dataset above before selecting variables.')

if 'density' in ds.data_vars:
    density = ds['density']
    print('Density dims:', density.dims)
    print('Density sample:', float(density.isel(layer=0).mean()))

if 'HS' in ds.data_vars:
    hs = ds['HS']
    print('Snow height dims:', hs.dims)
    print('Snow height sample:', float(hs.isel(time=0)))

**Explain.** Layer variables include `layer` or `z`, while profile summaries such as `HS` only vary over time or location.


In [None]:
# Check for understanding: variable membership
assert ('density' in ds.data_vars) or ('HS' in ds.data_vars)


### Understanding Metadata
**Show.** We'll read variable and dataset attributes to capture units and processing notes.


In [None]:
# Run.
if ds is None:
    raise RuntimeError('Load the dataset above before reading metadata.')

if 'density' in ds.data_vars:
    print(ds['density'].attrs)

print('Dataset metadata:', ds.attrs)

**Explain.** Attributes annotate each series with units and descriptions, which keeps scientific context alongside the numbers.


In [None]:
# Check for understanding: metadata presence
assert isinstance(ds.attrs, dict)
if 'density' in ds.data_vars:
    assert 'units' in ds['density'].attrs or ds['density'].attrs != {}


### Loading Multiple Files
**Show.** Here's how you'd combine several `.pro` or `.smet` files once you have them locally.


In [None]:
# Run.
example_paths = ['data/station1.pro', 'data/station2.pro']
print('Call xsnow.read(example_paths) when the files are available locally.')
print('Or point xsnow.read at a directory to auto-discover compatible files.')

**Explain.** `xsnow.read` gracefully merges inputs, aligning coordinates so you can focus on analysis instead of bookkeeping.


In [None]:
# Run.
# Check for understanding: example path hints
assert all(p.startswith('data/') for p in example_paths)


### The Special `z` Coordinate
**Show.** Let's grab the vertical coordinate so you can map layers to real-world depths.


In [None]:
# Run.
if ds is None:
    raise RuntimeError('Load the dataset above before accessing z.')

z = ds.coords['z']
print('z coordinate head:', z.values[:5])
print('Units:', z.attrs.get('units', 'unknown'))

**Explain.** Positive-down `z` values map each layer to depth, which keeps stratigraphy analyses consistent.


In [None]:
# Check for understanding: depth axis
assert 'z' in ds.coords
assert z.ndim == 1


### Play
Experiment by smoothing a temperature profile with different window sizes. Keep it light so execution stays fast.


In [None]:
# Run.
window = 3  # Try values between 2 and 5
if ds is None:
    raise RuntimeError('Load the dataset above before playing with smoothing.')

if 'temperature' in ds.data_vars:
    series = ds['temperature'].isel(location=0, layer=0).to_series()
else:
    series = ds.to_array().isel(variable=0, location=0, layer=0).to_series()
smoothed = series.rolling(window=window, min_periods=1, center=True).mean()
print(smoothed.head())


## Practice
Try the prompts before peeking at the solutions.


1. Load another xsnow helper dataset and compare its dimensions to the time-series set.
2. Plot the mean density over time for one location.
3. Inspect metadata for a meteorological variable and explain its units in words.


<details>
<summary>Solutions</summary>

1. Use `xsnow.profile_example()` (or another helper) and print `dataset.dims`. Compare the keys to `ds.dims`.
2. Select `ds['density']`, average with `.mean(dim='layer')`, and plot with `.to_pandas().plot()`.
3. Access something like `ds['TA'].attrs['units']` to describe the temperature units.

</details>


## Summary
- NumPy, pandas, and xarray each support xsnow's layered data structures.
- xsnow helper loaders give you quick practice datasets to explore.
- Metadata and coordinates keep physical context attached to every array.
