# 02: Basic Operations and Analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/02_basic_operations_and_analysis.ipynb)

Now that you understand xsnow's data structure, let's learn how to work with the data: selecting, filtering, and performing basic analyses.

## What You'll Learn

- Selecting data by location, time, and layer
- Filtering data with conditions
- Computing profile-level summaries
- Calculating snow water equivalent (SWE)
- Identifying weak layers
- Time series operations


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.


In [None]:

%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow



## Setup: Load Sample Data

Let's load sample data to work with:


In [None]:
import xsnow
import numpy as np

# Load sample data
print("Loading sample data...")
try:
    ds = xsnow.single_profile_timeseries()
    print("✅ Data loaded!")
    print(f"Dimensions: {dict(ds.dims)}")
except Exception as e:
    print(f"❌ Error: {e}")
    print("Install: pip install git+https://gitlab.com/avacollabra/postprocessing/xsnow")
    ds = None


## Part 1: Selecting Data

You can select parts of your data in two ways:

### Selecting by Label with `.sel()`

Use `.sel()` when you know the label (like a date or location name):


In [None]:
# Select a specific location
if len(ds.coords['location']) > 0:
    location_name = ds.coords['location'].values[0]
    ds_site = ds.sel(location=location_name)
    print(f"Selected location: {location_name}")
    print(f"Shape after selection: {dict(ds_site.dims)}")

# Select a specific time (or time range)
if len(ds.coords['time']) > 0:
    # Get first and last time
    times = ds.coords['time'].values
    print(f"\nTime range: {times[0]} to {times[-1]}")
    
    # Select a single time
    ds_single_time = ds.sel(time=times[0])
    print(f"Selected single time: {times[0]}")
    
    # Select a time range using slice
    ds_time_range = ds.sel(time=slice(times[0], times[9]))
    print(f"Selected time range: {times[0]} to {times[9]}")


### Selecting by Index with `.isel()`

Use `.isel()` when you want to select by position (first, second, etc.):


In [None]:
# Get first time step
ds_first = ds.isel(time=0)
print("First time step selected")

# Get first 5 time steps
ds_first5 = ds.isel(time=slice(0, 5))
print("First 5 time steps selected")

# Get surface layer (layer 0)
surface = ds.isel(layer=0)
print("Surface layer selected")

# Get multiple layers
top_layers = ds.isel(layer=[0, 1, 2])  # Top 3 layers
print("Top 3 layers selected")


### Selecting by Depth (using 'z' coordinate)

Remember the `z` coordinate? We can use it to select layers by depth:


In [None]:
# Select layers within 50 cm of surface (z between 0 and -0.5)
# Note: z is negative downward, so we want z >= -0.5
shallow = ds.where((ds.coords['z'] >= -0.5) & (ds.coords['z'] <= 0), drop=True)
print("Selected layers within 50 cm of surface")
print(f"Number of layers: {shallow.dims.get('layer', 'N/A')}")


## Part 2: Filtering Data with Conditions

You can filter data based on conditions (e.g., "show me only dense layers"). This is done with `.where()`:


In [None]:
# Find layers with density > 300 kg/m³ (dense snow)
dense_mask = ds['density'] > 300
dense_layers = ds.where(dense_mask, drop=True)

print("Filtered for dense layers (> 300 kg/m³)")
print(f"Original dataset layers: {ds.dims.get('layer', 'N/A')}")
print(f"After filtering: {dense_layers.dims.get('layer', 'N/A')} layers meet criteria")

# Find cold layers (temperature < -10°C)
if 'temperature' in ds.data_vars:
    cold_mask = ds['temperature'] < -10
    cold_layers = ds.where(cold_mask, drop=True)
    print(f"\nCold layers (< -10°C): {cold_layers.dims.get('layer', 'N/A')} layers")


## Part 3: Computing Profile-Level Summaries

Often you want to summarize layer-level data into a single value per profile. For example, "what's the average density in each profile?"


In [None]:
# Compute mean density across all layers for each profile
mean_density = ds['density'].mean(dim='layer')
print("Mean density per profile:")
print(f"Shape: {mean_density.shape}")
print(f"Dimensions: {mean_density.dims}")
print(f"Example values: {mean_density.values.flatten()[:5] if mean_density.size > 5 else mean_density.values}")

# Other useful aggregations
max_density = ds['density'].max(dim='layer')  # Maximum density in profile
min_density = ds['density'].min(dim='layer')   # Minimum density in profile
std_density = ds['density'].std(dim='layer')   # Standard deviation

print("\nOther aggregations:")
print(f"Max density shape: {max_density.shape}")
print(f"Min density shape: {min_density.shape}")

# Add as new variable to dataset
ds = ds.assign(mean_density=mean_density)
print("\n✅ Added 'mean_density' as new variable to dataset")


## Part 4: Calculating Snow Water Equivalent (SWE)

**Snow Water Equivalent (SWE)** is a critical metric: it tells you how much water is in the snowpack. SWE = density × thickness, summed over all layers.

### Understanding SWE

- **Why it matters**: SWE tells you the actual water content, not just snow depth
- **Units**: Typically mm or m of water
- **Calculation**: For each layer: density (kg/m³) × thickness (m) = water equivalent (kg/m² or mm)

Let's compute it:


In [None]:
# Method 1: If we have layer thickness directly
if 'thickness' in ds.data_vars:
    # SWE = sum(density * thickness) over all layers
    swe = (ds['density'] * ds['thickness']).sum(dim='layer') / 1000.0  # Convert to m
    print("✅ Computed SWE from density and thickness")

# Method 2: Compute thickness from 'z' coordinate (depth)
elif 'z' in ds.coords:
    # Layer thickness = difference in z between adjacent layers
    # For each layer, thickness = |z[i] - z[i+1]|, except last layer
    z = ds.coords['z']
    
    # Compute thickness by differencing z (absolute value since z is negative)
    # This is a simplified approach - in reality, you'd need to handle the last layer carefully
    z_diff = z.diff(dim='layer', label='upper')
    thickness = -z_diff  # Negative because z decreases downward
    
    # For the last layer, we need to estimate thickness
    # This is a simplified calculation
    print("Computing SWE from z coordinate...")
    print("Note: This is a simplified calculation. Real implementation would handle edge cases.")
    
    # Alternative: If HS (total height) is available, we can use it
    if 'HS' in ds.data_vars:
        # Approximate: assume layers are evenly distributed
        # This is not perfect but gives an idea
        print("Using HS to approximate layer thicknesses...")
        # For demonstration, we'll show the concept
        swe_approx = (ds['density'] * ds['HS'] / ds.dims['layer']).sum(dim='layer') / 1000.0
        print("Approximate SWE computed (simplified method)")

# Method 3: If SWE is already in the dataset
if 'SWE' in ds.data_vars or 'swe' in ds.data_vars:
    swe_var = ds['SWE'] if 'SWE' in ds.data_vars else ds['swe']
    print(f"✅ SWE already in dataset: {swe_var.attrs.get('units', 'units not specified')}")


## Part 5: Identifying Weak Layers

Weak layers are critical for avalanche forecasting. A weak layer is typically characterized by:
- Low density
- Specific grain types (e.g., faceted crystals)
- Low strength/hardness

Let's create a simple weak layer identifier:


In [None]:
# Define weak layer criteria
# This is a simplified example - real weak layer identification is more complex

weak_mask = None

if 'density' in ds.data_vars:
    # Weak layers often have low density (< 150 kg/m³)
    low_density = ds['density'] < 150
    
    if 'grain_type' in ds.data_vars:
        # Some grain types indicate weak layers (this depends on your grain type coding)
        # Example: grain_type == 4 might indicate faceted crystals
        # Note: Check your data's grain type coding scheme!
        print("Note: Grain type codes vary by SNOWPACK version. Adjust codes as needed.")
        # weak_grain = ds['grain_type'] == 4  # Example only
        weak_mask = low_density  # Simplified: just use density for now


## Part 6: Time Series Operations

xsnow makes it easy to analyze how snowpack properties change over time.

### Time-based Aggregations


In [None]:
# Compute mean over time (average profile)
if 'density' in ds.data_vars:
    mean_density_over_time = ds['density'].mean(dim='time')
    print("Mean density profile (averaged over all time steps):")
    print(f"Shape: {mean_density_over_time.shape}")
    print(f"Dimensions: {mean_density_over_time.dims}")

# Compute time series of profile-level variables
if 'HS' in ds.data_vars:
    hs_series = ds['HS'].isel(location=0, slope=0, realization=0)
    print(f"\nSnow height time series:")
    print(f"Mean HS: {hs_series.mean().values:.2f} m")
    print(f"Max HS: {hs_series.max().values:.2f} m")
    print(f"Min HS: {hs_series.min().values:.2f} m")

# Compute change over time (difference between consecutive time steps)
if 'HS' in ds.data_vars:
    hs_change = ds['HS'].diff(dim='time')
    print(f"\nSnow height change per time step:")
    print(f"Mean change: {hs_change.mean().values:.3f} m")
    print("(Positive = snow accumulating, Negative = snow melting)")


### Accessing Values as NumPy Arrays

Sometimes you need raw NumPy arrays for integration with other libraries:


In [None]:
# Get values as NumPy array
density_array = ds['density'].values
print(f"Density as NumPy array:")
print(f"  Shape: {density_array.shape}")
print(f"  Type: {type(density_array)}")
print(f"  Dtype: {density_array.dtype}")

# Get values for a specific selection
surface_density = ds['density'].sel(layer=0).values
print(f"\nSurface layer density array:")
print(f"  Shape: {surface_density.shape}")
print(f"  First few values: {surface_density.flatten()[:5]}")


## Summary

✅ **What we learned:**

1. **Selecting data**: `.sel()` for labels, `.isel()` for positions
2. **Filtering**: `.where()` with conditions to find specific layers/profiles
3. **Aggregations**: `.mean()`, `.max()`, `.min()`, `.sum()` across dimensions
4. **SWE calculation**: Sum of density × thickness over layers
5. **Weak layer identification**: Using conditions to find problematic layers
6. **Time series**: Analyzing changes over time with `.diff()`, `.mean()`, etc.
7. **NumPy integration**: `.values` to get raw arrays when needed

## Key Operations Cheat Sheet

```python
# Selection
ds.sel(location="VIR1A", time="2024-02-01")  # By label
ds.isel(time=0, layer=0)                      # By position

# Filtering
ds.where(ds['density'] > 300, drop=True)      # Condition

# Aggregations
ds['density'].mean(dim='layer')               # Mean over layers
ds['HS'].max(dim='time')                      # Max over time

# Time operations
ds['HS'].diff(dim='time')                     # Change over time
ds.resample(time='1D').mean()                 # Resample

# NumPy
ds['density'].values                          # Get array
```

## Next Steps

Ready to visualize your data? Move on to:
- **03_visualization.ipynb**: Create plots and visualizations

## Exercises

1. Select data for a specific date and print the density profile
2. Find all layers with temperature below -5°C
3. Compute the mean density for the top 3 layers across all time steps
4. Calculate how much the snow height changed between the first and last time step
5. Identify profiles that have at least one layer with density < 100 kg/m³
