# 01: Introduction to xsnow and Loading Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/01_introduction_and_loading_data.ipynb)

This notebook introduces you to xsnow, the Python libraries it builds on, and how to load snowpack data.

## What You'll Learn

- What xsnow is and why it's useful
- Python fundamentals: NumPy, pandas, and xarray basics
- Understanding xsnow's 5-dimensional data model
- How to load .pro and .smet files
- Exploring dataset structure and metadata


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.

In [None]:
# Install xsnow from git (run this cell if using Colab or if you haven't installed yet)
# Uncomment the lines below to install:

# %pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
# %pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow

print("To install xsnow, uncomment the pip install lines above and run this cell.")
print("If xsnow is already installed, you can skip this cell.")


## Part 1: What is xsnow?

**xsnow** is a Python library designed to make working with snowpack simulation data efficient and intuitive. It's built specifically for data from the SNOWPACK model (and other snow models), which outputs detailed information about snow layers over time.

### Why xsnow?

- **Handles complex file formats**: SNOWPACK outputs come in specialized formats (.pro, .smet) that xsnow can parse automatically
- **Organized data structure**: Instead of juggling hundreds of separate files, xsnow organizes everything into a single, coherent dataset
- **Powerful analysis tools**: Built-in functions for common snowpack analyses (SWE, weak layers, stability indices)
- **Built on proven libraries**: Uses xarray, NumPy, and pandas under the hood, so you get their full power

### The Big Picture

Think of xsnow as a translator: it takes raw SNOWPACK output files and converts them into a format that's easy to work with in Python. Instead of manually parsing text files, you get a clean, organized dataset where you can ask questions like "Show me all weak layers on north-facing slopes after February 1st" with simple code.


## Part 2: Python Fundamentals for xsnow

xsnow builds on several important Python libraries. Let's get familiar with the basics you'll need.

### NumPy: Working with Arrays

NumPy provides arrays (like lists, but faster and more powerful) for numerical computing.


In [None]:
import numpy as np

# Create a simple array
temperatures = np.array([-5, -3, -1, 0, -2])
print("Temperatures:", temperatures)
print("Mean temperature:", np.mean(temperatures))
print("Shape:", temperatures.shape)

# Arrays can be multi-dimensional
# Imagine 3 layers, each with a temperature
layer_temps = np.array([[-5, -3, -1],  # Layer 0 (surface)
                        [-3, -2, -1],  # Layer 1
                        [-2, -1, 0]])  # Layer 2
print("\nLayer temperatures (3 layers, 3 time steps):")
print(layer_temps)
print("Shape:", layer_temps.shape)  # (layers, time)


### Pandas: Working with Tables

Pandas is great for tabular data (like spreadsheets). While xsnow uses xarray (which is more powerful for multi-dimensional data), understanding pandas helps.


In [None]:
import pandas as pd

# Create a simple table (DataFrame)
data = {
    'time': ['2024-01-01', '2024-01-02', '2024-01-03'],
    'snow_depth': [50, 55, 60],  # cm
    'temperature': [-5, -3, -1]   # °C
}
df = pd.DataFrame(data)
print("Snow data table:")
print(df)
print("\nMean snow depth:", df['snow_depth'].mean())


### XArray: Multi-Dimensional Labeled Arrays

**XArray is the foundation of xsnow!** It's like pandas, but for multi-dimensional data. This is perfect for snowpack data which has:
- Multiple locations
- Multiple time steps
- Multiple layers
- Multiple variables (density, temperature, etc.)

Let's see a simple example:


In [None]:
import xarray as xr

# Create a simple xarray DataArray (like a NumPy array with labels)
# Let's say we have temperature data for 2 locations, 3 time steps, 2 layers
temps = np.array([[[-5, -3],   # Location 0, Time 0, Layers [0, 1]
                   [-4, -2],   # Location 0, Time 1
                   [-3, -1]],  # Location 0, Time 2
                  [[-6, -4],   # Location 1, Time 0
                   [-5, -3],   # Location 1, Time 1
                   [-4, -2]]]) # Location 1, Time 2

# Create labeled dimensions
da = xr.DataArray(
    temps,
    dims=['location', 'time', 'layer'],
    coords={
        'location': ['Station_A', 'Station_B'],
        'time': pd.date_range('2024-01-01', periods=3, freq='D'),
        'layer': [0, 1]  # Layer 0 = surface, Layer 1 = deeper
    },
    name='temperature'
)

print("XArray DataArray:")
print(da)
print("\nSelect Station_A on Jan 2:")
print(da.sel(location='Station_A', time='2024-01-02'))


**Key XArray Concepts:**
- **Dimensions**: The axes of your data (location, time, layer)
- **Coordinates**: Labels for each dimension (e.g., station names, dates)
- **DataArray**: A single variable with dimensions (like our temperature above)
- **Dataset**: A collection of DataArrays that share dimensions

xsnow uses xarray's Dataset structure to organize all snowpack variables together!


## Part 3: Understanding xsnow's Data Model

xsnow organizes snowpack data using **5 key dimensions**. This might sound complex, but it's actually very logical once you understand it.

### The 5 Dimensions

1. **location**: The site or grid point (e.g., "VIR1A", "Station_1")
2. **time**: When the profile was measured/simulated
3. **slope**: Different slope aspects at the same location (north-facing, south-facing, etc.)
4. **realization**: Different model runs or scenarios (for ensemble runs)
5. **layer**: The vertical layers within the snowpack (layer 0 = surface, higher numbers = deeper)

### Why This Structure?

This structure allows you to ask powerful questions:
- "Show me density profiles for all locations on February 1st"
- "Compare north vs south-facing slopes"
- "Find weak layers across all time steps"
- All without writing loops!

### Profile-level vs Layer-level Variables

- **Profile-level**: Properties of the entire snowpack (e.g., total snow height HS). These don't vary by layer.
- **Layer-level**: Properties of individual layers (e.g., density, temperature). These vary by layer.

Let's see this in action once we load some data!


## Part 4: Loading Data with xsnow

Now let's actually load some data! xsnow can read SNOWPACK output files in `.pro` (profile) and `.smet` (meteorological) formats.

### Understanding File Formats

- **`.pro` files**: Contain time series of snow profiles with layer-by-layer data
- **`.smet` files**: Contain time series of scalar variables (temperature, precipitation, etc.) without layers

### Getting Sample Data

Before we can load data, you need sample files. See `data/README.md` for instructions on obtaining sample data. For this tutorial, we'll assume you have a file called `sample_profile.pro` in the `data/` directory.

**Note**: If you don't have sample data yet, the code below will show you what to expect and how to handle errors gracefully.


In [None]:
import xsnow
import os

# Check if we have sample data
data_dir = "data"
sample_file = os.path.join(data_dir, "sample_profile.pro")

print("Looking for sample data...")
if os.path.exists(sample_file):
    print(f"✅ Found: {sample_file}")
    print("Loading data with xsnow.read()...")
    # This is the main function for loading data!
    ds = xsnow.read(sample_file)
    print("\n✅ Data loaded successfully!")
    print("\nDataset summary:")
    print(ds)
else:
    print(f"❌ Sample file not found: {sample_file}")
    print("\nTo get sample data:")
    print("1. Check data/README.md for instructions")
    print("2. Download sample .pro or .smet files")
    print("3. Place them in the data/ directory")
    print("\nFor now, we'll show you what the structure looks like...")
    ds = None


### Understanding the Dataset Structure

When you load data with `xsnow.read()`, you get an **xsnowDataset**. This is a special wrapper around an xarray Dataset that's designed for snowpack data.

Let's explore what's inside:


In [None]:
if ds is not None:
    # Print the dataset (this shows dimensions, coordinates, and variables)
    print("=== DATASET OVERVIEW ===")
    print(ds)
    
    print("\n=== DIMENSIONS ===")
    print(f"Dimensions: {ds.dims}")
    print("\nWhat this means:")
    for dim, size in ds.dims.items():
        print(f"  - {dim}: {size} values")
    
    print("\n=== COORDINATES ===")
    print("Coordinates are labels for each dimension:")
    for coord_name in list(ds.coords.keys())[:5]:  # Show first 5
        coord = ds.coords[coord_name]
        print(f"  - {coord_name}: {coord.values if hasattr(coord.values, '__len__') and len(coord.values) < 10 else '...'}")
    
    print("\n=== DATA VARIABLES ===")
    print(f"Number of variables: {len(ds.data_vars)}")
    print("Some key variables:")
    for var_name in list(ds.data_vars.keys())[:10]:  # Show first 10
        var = ds[var_name]
        print(f"  - {var_name}: shape {var.shape}, dims {var.dims}")
else:
    print("""
    When you load data, you'll see something like:
    
    <xsnowDataset>
    Dimensions:  (location: 1, time: 373, slope: 1, realization: 1, layer: 12)
    Coordinates:
      * location       (location) <U5 'VIR1A'
      * time           (time) datetime64[ns] 2024-01-18 ... 2024-02-02
      * slope          (slope) int64 0
      * realization    (realization) int64 0
      * layer          (layer) int64 0 1 2 ... 11
    Data variables:
        density        (location, time, slope, realization, layer) float32 ...
        temperature    (location, time, slope, realization, layer) float32 ...
        HS             (location, time, slope, realization) float32 ...
        ...
    """)


### Inspecting Specific Variables

Let's look at individual variables to understand the difference between profile-level and layer-level data:


In [None]:
if ds is not None:
    # Layer-level variable (has 'layer' dimension)
    if 'density' in ds.data_vars:
        print("=== LAYER-LEVEL VARIABLE: density ===")
        density = ds['density']
        print(f"Dimensions: {density.dims}")
        print(f"Shape: {density.shape}")
        print(f"Units: {density.attrs.get('units', 'not specified')}")
        print("\nThis variable has a value for EACH layer in EACH profile")
        print(f"Example: First profile, all layers: {density.isel(location=0, time=0, slope=0, realization=0).values}")
    
    # Profile-level variable (no 'layer' dimension)
    if 'HS' in ds.data_vars:
        print("\n=== PROFILE-LEVEL VARIABLE: HS (snow height) ===")
        hs = ds['HS']
        print(f"Dimensions: {hs.dims}")
        print(f"Shape: {hs.shape}")
        print(f"Units: {hs.attrs.get('units', 'not specified')}")
        print("\nThis variable has ONE value per profile (total snow height)")
        print(f"Example: First few time steps: {hs.isel(location=0, slope=0, realization=0).values[:5] if len(hs.values.flatten()) >= 5 else hs.values}")
else:
    print("""
    Layer-level variables (like density, temperature):
    - Have dimensions: (location, time, slope, realization, layer)
    - One value per layer per profile
    
    Profile-level variables (like HS - total snow height):
    - Have dimensions: (location, time, slope, realization)
    - One value per profile (no layer dimension)
    """)


### Understanding Metadata

xsnow attaches useful metadata to variables (like units, descriptions). Let's check:


In [None]:
if ds is not None:
    # Check attributes (metadata) for a variable
    if 'density' in ds.data_vars:
        print("=== METADATA FOR 'density' ===")
        print(ds['density'].attrs)
        print("\nThis tells us:")
        print("  - Units: What units the data is in")
        print("  - Long name: Human-readable description")
        print("  - Other info: Source, processing history, etc.")
    
    # Check dataset-level attributes
    print("\n=== DATASET ATTRIBUTES ===")
    if hasattr(ds, 'attrs') and len(ds.attrs) > 0:
        print(ds.attrs)
    else:
        print("Dataset-level attributes (if any):")
        print("  - History of operations")
        print("  - Source files")
        print("  - Processing information")
else:
    print("""
    Metadata (attributes) provide important information:
    - Units (kg/m³, °C, etc.)
    - Descriptions
    - Processing history
    
    Always check .attrs when working with scientific data!
    """)


### Loading Multiple Files

xsnow can load and merge multiple files at once:


In [None]:
# Example: Loading multiple files
# ds = xsnow.read(["data/file1.pro", "data/file2.pro", "data/meteo.smet"])

# Or loading all files in a directory
# ds = xsnow.read("data/")

print("""
You can load multiple files like this:

# List of files
ds = xsnow.read(["data/station1.pro", "data/station2.pro"])

# Entire directory
ds = xsnow.read("data/")

xsnow will automatically:
- Merge data from different files
- Align them by location and time
- Combine profile and meteorological data
""")


### The Special 'z' Coordinate

xsnow automatically computes a depth coordinate `z` that represents depth below the snow surface:
- `z = 0` at the snow surface
- `z` is negative downward (so `z = -0.5` means 50 cm below surface)

This is very useful for analysis!


In [None]:
if ds is not None and 'z' in ds.coords:
    print("=== DEPTH COORDINATE 'z' ===")
    z = ds.coords['z']
    print(f"Shape: {z.shape}")
    print(f"Example values for first profile: {z.isel(location=0, time=0, slope=0, realization=0).values}")
    print("\nRemember:")
    print("  - z = 0: Snow surface")
    print("  - z < 0: Below surface (more negative = deeper)")
else:
    print("""
    The 'z' coordinate is automatically computed by xsnow.
    It represents depth below the snow surface:
      - z = 0.0 m: Surface
      - z = -0.5 m: 50 cm below surface
      - z = -1.0 m: 1 m below surface
    
    This makes it easy to select layers by depth!
    """)


## Summary

✅ **What we learned:**

1. **xsnow** is a Python library for working with snowpack simulation data
2. **Python fundamentals**: NumPy (arrays), pandas (tables), xarray (multi-dimensional labeled arrays)
3. **xsnow's data model**: 5 dimensions (location, time, slope, realization, layer)
4. **Loading data**: Use `xsnow.read()` to load .pro and .smet files
5. **Dataset structure**: xsnowDataset contains dimensions, coordinates, and data variables
6. **Two types of variables**: Profile-level (like HS) and layer-level (like density)
7. **Metadata**: Check `.attrs` for units and descriptions

## Key Concepts to Remember

- **xsnowDataset** = wrapper around xarray Dataset, specialized for snowpack data
- **Dimensions** = the axes of your data (location, time, slope, realization, layer)
- **Coordinates** = labels for dimensions (station names, dates, etc.)
- **Profile-level** = one value per profile (no layer dimension)
- **Layer-level** = one value per layer per profile (has layer dimension)

## Next Steps

Ready to start working with the data? Move on to:
- **02_basic_operations_and_analysis.ipynb**: Learn how to select, filter, and analyze your data

## Exercises (Try These!)

1. Load a sample .pro file and print its dimensions
2. List all the data variables in your dataset
3. Check the units for density and temperature
4. Find the time range of your data
5. Count how many layers the deepest profile has
