# 02: Basic Operations and Analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/02_basic_operations_and_analysis.ipynb)

Now that you understand xsnow's data structure, let's learn how to work with the data: selecting, filtering, and performing basic analyses.

## What You'll Learn

- Selecting data by location, time, and layer
- Filtering data with conditions
- Computing profile-level summaries
- Calculating snow water equivalent (SWE)
- Identifying weak layers
- Time series operations


### Learning objectives
- Select xsnow data by location, time, and depth to isolate relevant layers.
- Filter layers with conditional masks to spotlight weak or dense snow.
- Compute profile summaries and SWE estimates for decision support.
- Translate xsnow objects to familiar NumPy arrays for downstream tools.

**Prerequisites**
- [ ] Familiarity with xsnow basics from Notebook 01.
- [ ] Comfort indexing pandas/xarray objects.
- [ ] Ability to interpret scientific units like density and SWE.


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.


In [None]:

%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow



## Setup: Load Sample Data
**Show.** We'll pull the bundled dataset so every selection example has real coordinates.


In [None]:
# Run.
import xsnow
import numpy as np

# Load sample data
print("Loading sample data...")
try:
    ds = xsnow.single_profile_timeseries()
    print("✅ Data loaded!")
    print(f"Dimensions: {dict(ds.dims)}")
except Exception as e:
    print(f"❌ Error: {e}")
    print("Install: pip install git+https://gitlab.com/avacollabra/postprocessing/xsnow")
    ds = None


**Explain.** A quick helper call gives us a multi-dimensional dataset with location, slope, realization, time, and layer axes.


In [None]:
# Check for understanding: dataset ready
assert ds is not None, 'Install xsnow before proceeding.'
assert 'location' in ds.dims


## Part 1: Selecting Data
**Show.** We'll grab slices by location, time, and depth to practice navigation.


In [None]:
# Run.
# Select a specific location
if len(ds.coords['location']) > 0:
    location_name = ds.coords['location'].values[0]
    ds_site = ds.sel(location=location_name)

# Select a specific time (or time range)
if len(ds.coords['time']) > 0:
    # Get first and last time
    times = ds.coords['time'].values
    
    # Select a single time
    ds_single_time = ds.sel(time=times[0])
    
    # Select a time range using slice
    ds_time_range = ds.sel(time=slice(times[0], times[9]))


**Explain.** Named selections with `.sel` keep coordinate labels intact so you always know which site and timestamp you're analyzing.


In [None]:
# Check for understanding: selection shapes
assert 'location' not in ds_site.dims
assert ds_single_time.sizes['layer'] == ds.sizes['layer']


### Selecting by Index with `.isel()`
**Show.** Use `.isel()` to pull positional slices along time and layer axes.


In [None]:
# Run.
# Get first time step
ds_first = ds.isel(time=0)

# Get first 5 time steps
ds_first5 = ds.isel(time=slice(0, 5))

# Get surface layer (layer 0)
surface = ds.isel(layer=0)

# Get multiple layers
top_layers = ds.isel(layer=[0, 1, 2])  # Top 3 layers


**Explain.** `.isel` provides quick positional slicing when you just need the first few entries without worrying about coordinate labels.


In [None]:
# Check for understanding: positional slicing
assert ds_first.sizes['time'] == 1
assert top_layers.sizes['layer'] == 3


### Selecting by Depth (using 'z' coordinate)
**Show.** Filter layers by physical depth using the signed `z` coordinate.


In [None]:
# Run.
# Select layers within 50 cm of surface (z between 0 and -0.5)
# Note: z is negative downward, so we want z >= -0.5
shallow = ds.where((ds.coords['z'] >= -0.5) & (ds.coords['z'] <= 0), drop=True)


**Explain.** Combining boolean masks with `.where(..., drop=True)` filters the dataset to just the physical layers you care about.


In [None]:
# Check for understanding: depth filter
assert bool((shallow.coords['z'] >= -0.5).all())
assert shallow.dims.get('layer', 0) <= ds.dims['layer']


## Part 2: Filtering Data with Conditions
**Show.** Build boolean masks to spotlight layers that meet safety criteria.


In [None]:
# Run.
# Find layers with density > 300 kg/m³ (dense snow)
dense_mask = ds['density'] > 300
dense_layers = ds.where(dense_mask, drop=True)


# Find cold layers (temperature < -10°C)
if 'temperature' in ds.data_vars:
    cold_mask = ds['temperature'] < -10
    cold_layers = ds.where(cold_mask, drop=True)


**Explain.** Density-based masks reveal crusts or heavy slabs while preserving contextual coordinates.


In [None]:
# Check for understanding: dense layer mask
assert float(dense_layers['density'].min()) >= 300


## Part 3: Computing Profile-Level Summaries
**Show.** Aggregate layer data into profile summaries for quick dashboards.


In [None]:
# Run.
# Compute mean density across all layers for each profile
mean_density = ds['density'].mean(dim='layer')

# Other useful aggregations
max_density = ds['density'].max(dim='layer')  # Maximum density in profile
min_density = ds['density'].min(dim='layer')   # Minimum density in profile
std_density = ds['density'].std(dim='layer')   # Standard deviation


# Add as new variable to dataset
ds = ds.assign(mean_density=mean_density)


**Explain.** Aggregations collapse the layer axis so you can compare profiles over time or across locations at a glance.


In [None]:
# Check for understanding: mean density shape
assert 'layer' not in mean_density.dims
assert 'mean_density' in ds.data_vars


## Part 4: Calculating Snow Water Equivalent (SWE)
**Show.** Explore strategies to approximate SWE depending on available variables.


In [None]:
# Run.
# Method 1: If we have layer thickness directly
if 'thickness' in ds.data_vars:
    # SWE = sum(density * thickness) over all layers
    swe = (ds['density'] * ds['thickness']).sum(dim='layer') / 1000.0  # Convert to m

# Method 2: Compute thickness from 'z' coordinate (depth)
elif 'z' in ds.coords:
    # Layer thickness = difference in z between adjacent layers
    # For each layer, thickness = |z[i] - z[i+1]|, except last layer
    z = ds.coords['z']
    
    # Compute thickness by differencing z (absolute value since z is negative)
    # This is a simplified approach - in reality, you'd need to handle the last layer carefully
    z_diff = z.diff(dim='layer', label='upper')
    thickness = -z_diff  # Negative because z decreases downward
    
    # For the last layer, we need to estimate thickness
    # This is a simplified calculation
    
    # Alternative: If HS (total height) is available, we can use it
    if 'HS' in ds.data_vars:
        # Approximate: assume layers are evenly distributed
        # This is not perfect but gives an idea
        # For demonstration, we'll show the concept
        swe_approx = (ds['density'] * ds['HS'] / ds.dims['layer']).sum(dim='layer') / 1000.0

# Method 3: If SWE is already in the dataset
if 'SWE' in ds.data_vars or 'swe' in ds.data_vars:
    swe_var = ds['SWE'] if 'SWE' in ds.data_vars else ds['swe']


**Explain.** SWE estimates depend on which supporting variables exist, so the notebook demonstrates fallback strategies.


In [None]:
# Check for understanding: SWE placeholders
if 'SWE' in ds.data_vars:
    assert swe_var.dims == ds['SWE'].dims


## Part 5: Identifying Weak Layers
**Show.** Combine density and grain clues to flag weak layers.


In [None]:
# Run.
# Define weak layer criteria
# This is a simplified example - real weak layer identification is more complex

weak_mask = None

if 'density' in ds.data_vars:
    # Weak layers often have low density (< 150 kg/m³)
    low_density = ds['density'] < 150
    
    if 'grain_type' in ds.data_vars:
        # Some grain types indicate weak layers (this depends on your grain type coding)
        # Example: grain_type == 4 might indicate faceted crystals
        # Note: Check your data's grain type coding scheme!
        # weak_grain = ds['grain_type'] == 4  # Example only
        weak_mask = low_density  # Simplified: just use density for now


**Explain.** Even simple heuristics (like low density) can highlight suspect layers for deeper investigation.


In [None]:
# Check for understanding: weak mask created
assert weak_mask is not None
assert weak_mask.dims == ds['density'].dims


## Part 6: Time Series Operations
**Show.** Analyze change over time by averaging, slicing, and differencing.


In [None]:
# Run.
# Compute mean over time (average profile)
if 'density' in ds.data_vars:
    mean_density_over_time = ds['density'].mean(dim='time')

# Compute time series of profile-level variables
if 'HS' in ds.data_vars:
    hs_series = ds['HS'].isel(location=0, slope=0, realization=0)

# Compute change over time (difference between consecutive time steps)
if 'HS' in ds.data_vars:
    hs_change = ds['HS'].diff(dim='time')


**Explain.** Time-based reductions and differences expose evolving snowpack structure.


In [None]:
# Check for understanding: time operations
if 'density' in ds.data_vars:
    assert 'layer' in mean_density_over_time.dims
if 'HS' in ds.data_vars:
    assert hs_change.dims['time'] == ds.dims['time'] - 1


### Accessing Values as NumPy Arrays
**Show.** Convert xarray-backed data into bare NumPy arrays for external tools.


In [None]:
# Run.
# Get values as NumPy array
density_array = ds['density'].values

# Get values for a specific selection
surface_density = ds['density'].sel(layer=0).values


**Explain.** Pulling NumPy arrays lets you hand data to scikit-learn, SciPy, or custom simulations.


In [None]:
# Check for understanding: numpy access
assert density_array.ndim >= 1
assert surface_density.shape[0] == ds.dims['time']


### Play
Adjust the density threshold or depth window to explore how selections change. Keep runs lightweight by sampling a single location.


In [None]:
# Run.
density_cutoff = 280  # Try between 250 and 320
depth_limit = -0.6  # Try between -0.2 and -1.0

subset = ds.isel(location=0).where(ds['density'] > density_cutoff, drop=True)
shallow_subset = subset.where(ds.coords['z'] >= depth_limit, drop=True)
print('Layers retained:', shallow_subset.dims.get('layer', 0))


## Practice
Test yourself with these prompts before opening the solutions.


1. Create a mask for layers where temperature is warmer than -5°C and inspect the remaining density.
2. Compute the rolling 3-step mean of `HS` for a single location.
3. Export a selection of `density` to a pandas DataFrame and describe the index levels.


<details>
<summary>Solutions</summary>

1. `warm = ds.where(ds['temperature'] > -5, drop=True)` then inspect `warm['density']`.
2. `ds['HS'].isel(location=0, slope=0, realization=0).rolling(time=3).mean()` demonstrates smoothing.
3. `ds['density'].isel(location=0).to_dataframe().head()` reveals the MultiIndex structure.

</details>


## Summary
- Coordinate-based selections and boolean masks isolate layers for closer study.
- Aggregations and SWE estimates condense stratigraphy into actionable metrics.
- Exporting to NumPy unlocks interoperability with broader scientific tools.
