# 05: Working with Custom Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/05_working_with_custom_data.ipynb)

This notebook shows you how to prepare and load your own SNOWPACK output files into xsnow.

## What You'll Learn

- Preparing your own .pro and .smet files
- File format requirements
- Loading custom data
- Troubleshooting common issues
- Merging multiple data sources
- Data validation


## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies. If you're running locally and have already installed xsnow, you can skip this cell.


In [None]:
# Install xsnow from git (run this cell if using Colab or if you haven't installed yet)
# Uncomment the lines below to install:

# %pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
# %pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow

print("To install xsnow, uncomment the pip install lines above and run this cell.")
print("If xsnow is already installed, you can skip this cell.")


In [None]:
import xsnow
import os
import glob

print("This notebook will help you work with your own data files.")


In [None]:
# Example: Explore xsnow sample data
import xsnow

print("xsnow provides two main sample datasets:")
print()

# Example 1: Single profile
print("1. Single profile (one snapshot):")
try:
    ds_single = xsnow.single_profile()
    print(f"   ✅ Loaded! Dimensions: {dict(ds_single.dims)}")
except Exception as e:
    print(f"   ❌ Error: {e}")

print()
# Example 2: Time series
print("2. Time series (multiple snapshots over time):")
try:
    ds_timeseries = xsnow.single_profile_timeseries()
    print(f"   ✅ Loaded! Dimensions: {dict(ds_timeseries.dims)}")
except Exception as e:
    print(f"   ❌ Error: {e}")


## Part 1: File Format Requirements

xsnow can read SNOWPACK output files in these formats:

### .pro Files (Profile Time Series)

- **Format**: SNOWPACK profile format (legacy)
- **Contains**: Time series of snow profiles with layer-by-layer data
- **Required**: Header with station metadata, profile data blocks
- **Generated by**: SNOWPACK when `PROF_FORMAT = PRO` in .ini file

### .smet Files (Meteorological Time Series)

- **Format**: SMET (MeteoIO format)
- **Contains**: Time series of scalar variables (no layers)
- **Required**: SMET header with field descriptions, time series data
- **Generated by**: SNOWPACK or MeteoIO for meteorological data

### Other Formats

xsnow may support other formats (check documentation):
- NetCDF (if SNOWPACK outputs to NetCDF)
- Other SNOWPACK output formats


## Part 2: Preparing Your Files

### Step 1: Generate SNOWPACK Output

If you're running SNOWPACK yourself:

1. **Configure SNOWPACK** (via Inishell or .ini file):
   - Set `PROF_FORMAT = PRO` to generate .pro files
   - Configure which variables to output
   - Set output directory

2. **Run SNOWPACK** simulation

3. **Check output files**:
   - Look for `.pro` files in output directory
   - Check for `.smet` files if configured

### Step 2: Verify File Format

Let's check if your files are in the correct format:


In [None]:
# Check for .pro files in data directory
data_dir = "data"
pro_files = glob.glob(os.path.join(data_dir, "*.pro"))
smet_files = glob.glob(os.path.join(data_dir, "*.smet"))

print(f"Found {len(pro_files)} .pro files:")
for f in pro_files[:5]:  # Show first 5
    print(f"  - {os.path.basename(f)}")

print(f"\nFound {len(smet_files)} .smet files:")
for f in smet_files[:5]:  # Show first 5
    print(f"  - {os.path.basename(f)}")

# Quick format check
if pro_files:
    print("\n" + "="*50)
    print("Checking first .pro file format...")
    first_file = pro_files[0]
    with open(first_file, 'r') as f:
        first_lines = [f.readline() for _ in range(10)]
        print("First few lines:")
        for i, line in enumerate(first_lines[:5]):
            print(f"  {i+1}: {line.strip()[:80]}")


## Part 3: Loading Your Custom Data

Now let's load your files:


In [None]:
# Method 1: Load a single file
if pro_files:
    try:
        ds = xsnow.read(pro_files[0])
        print(f"✅ Successfully loaded: {os.path.basename(pro_files[0])}")
        print(f"\nDataset summary:")
        print(f"  Locations: {ds.dims.get('location', 0)}")
        print(f"  Time steps: {ds.dims.get('time', 0)}")
        print(f"  Layers: {ds.dims.get('layer', 0)}")
        print(f"  Variables: {len(ds.data_vars)}")
    except Exception as e:
        print(f"❌ Error loading file: {e}")
        print("\nTroubleshooting:")
        print("1. Check file format is correct")
        print("2. Verify file is not corrupted")
        print("3. Check xsnow documentation for format requirements")
        ds = None
else:
    print("No .pro files found in data/ directory")
    print("\nTo load your own files:")
    print("1. Place .pro or .smet files in the data/ directory")
    print("2. Or specify full path: ds = xsnow.read('/path/to/your/file.pro')")
    ds = None


### Loading Multiple Files

You can load multiple files at once:


In [None]:
# Method 2: Load multiple files
if len(pro_files) > 1:
    try:
        ds_multi = xsnow.read(pro_files[:3])  # Load first 3 files
        print(f"✅ Successfully loaded {len(pro_files[:3])} files")
        print(f"  Combined locations: {ds_multi.dims.get('location', 0)}")
        print(f"  Time steps: {ds_multi.dims.get('time', 0)}")
    except Exception as e:
        print(f"❌ Error loading multiple files: {e}")
        print("\nNote: Files must have compatible formats and time ranges")
else:
    print("""
    Loading multiple files:
    
    # List of files
    ds = xsnow.read(['data/file1.pro', 'data/file2.pro'])
    
    # All files in directory
    ds = xsnow.read('data/')
    
    # Mix of .pro and .smet
    ds = xsnow.read(['data/profile.pro', 'data/meteo.smet'])
    """)


## Part 4: Troubleshooting Common Issues

### Issue 1: File Not Found

**Error**: `FileNotFoundError` or similar

**Solutions**:
- Check file path is correct
- Use absolute paths if relative paths don't work
- Verify file exists: `os.path.exists('path/to/file.pro')`


In [None]:
# Example: Check if file exists before loading
test_file = "data/your_file.pro"
if os.path.exists(test_file):
    print(f"✅ File exists: {test_file}")
    # ds = xsnow.read(test_file)
else:
    print(f"❌ File not found: {test_file}")
    print("\nTips:")
    print("1. Check spelling of filename")
    print("2. Use absolute path: /full/path/to/file.pro")
    print("3. Check current directory: os.getcwd()")


### Issue 2: Format Not Recognized

**Error**: File format not supported or parsing errors

**Solutions**:
- Verify file is actual .pro or .smet format (not just renamed)
- Check file header matches expected format
- Try opening file in text editor to inspect structure
- Check SNOWPACK version compatibility


In [None]:
# Inspect file header
if pro_files:
    print("Inspecting file header...")
    with open(pro_files[0], 'r') as f:
        header_lines = [f.readline().strip() for _ in range(20)]
        print("First 20 lines:")
        for i, line in enumerate(header_lines):
            if line:  # Skip empty lines
                print(f"  {i+1}: {line[:100]}")
    
    print("\nLook for:")
    print("  - Station name/identifier")
    print("  - Coordinates (latitude, longitude)")
    print("  - Column headers")
    print("  - Data format indicators")


### Issue 3: Missing Variables

**Problem**: Expected variables not in dataset

**Solutions**:
- Check SNOWPACK output configuration
- Verify variables were enabled in SNOWPACK .ini file
- Some variables may be computed by xsnow (like HS, z)
- Check variable names match xsnow's expected names


In [None]:
print("Variables in your dataset:")
for var in list(ds.data_vars.keys())[:20]:  # Show first 20
    print(f"  - {var}")

# Check for common variables
common_vars = ['density', 'temperature', 'HS', 'grain_type', 'grain_size']
print("\nChecking for common variables:")
for var in common_vars:
    if var in ds.data_vars:
        print(f"  ✅ {var}")
else:
    print("Load data first to check variables")


### Issue 4: Time Alignment Problems

**Problem**: Multiple files have different time ranges or frequencies

**Solutions**:
- xsnow will try to align times automatically
- Check time ranges: `ds.coords['time'].values`
- Resample if needed: `ds.resample(time='1H').mean()`
- Manually select overlapping time periods


In [None]:
times = ds.coords['time'].values
print(f"Time range in dataset:")
print(f"  Start: {times[0]}")
print(f"  End: {times[-1]}")
print(f"  Number of time steps: {len(times)}")

# Check time frequency
if len(times) > 1:
    time_diff = times[1] - times[0]
    print(f"  Time step: {time_diff}")
    
print("\nIf merging multiple files with different times:")
print("  xsnow will align them automatically")
print("  Or manually select: ds.sel(time=slice('2024-01-01', '2024-01-31'))")


## Part 5: Data Validation

After loading, validate your data:


In [None]:
print("=== DATA VALIDATION ===")

# Check for NaN values
if 'density' in ds.data_vars:
    nan_count = ds['density'].isnull().sum().values
    total_count = ds['density'].size
    print(f"\nNaN values in density: {nan_count} / {total_count}")
    if nan_count > 0:
        print("  ⚠️ Some NaN values found (may be normal for shallow snowpack)")

# Check for reasonable value ranges
if 'density' in ds.data_vars:
    density_vals = ds['density'].values
    valid_vals = density_vals[~np.isnan(density_vals)]
    if len(valid_vals) > 0:
        print(f"\nDensity range: {valid_vals.min():.1f} to {valid_vals.max():.1f} kg/m³")
        if valid_vals.min() < 0 or valid_vals.max() > 1000:
            print("  ⚠️ Unusual density values - check data quality")

if 'temperature' in ds.data_vars:
    temp_vals = ds['temperature'].values
    valid_vals = temp_vals[~np.isnan(temp_vals)]
    if len(valid_vals) > 0:
        print(f"Temperature range: {valid_vals.min():.1f} to {valid_vals.max():.1f} °C")
        if valid_vals.min() < -50 or valid_vals.max() > 10:
            print("  ⚠️ Unusual temperature values - check units")

# Check dimensions
print(f"\nDataset dimensions:")
for dim, size in ds.dims.items():
    print(f"  {dim}: {size}")

print("\n✅ Validation complete")


## Part 6: Merging Profile and Meteorological Data


In [None]:
# Example: Merge profile and meteo data
if pro_files and smet_files:
    try:
        # Load both types
        ds_combined = xsnow.read([pro_files[0], smet_files[0]])
        print("✅ Successfully merged profile and meteorological data")
        print(f"  Variables: {len(ds_combined.data_vars)}")
        print("\nVariables from .pro (layer-level):")
        layer_vars = [v for v in ds_combined.data_vars if 'layer' in ds_combined[v].dims]
        for v in layer_vars[:5]:
            print(f"  - {v}")
        
        print("\nVariables from .smet (profile-level, no layers):")
        profile_vars = [v for v in ds_combined.data_vars if 'layer' not in ds_combined[v].dims]
        for v in profile_vars[:5]:
            print(f"  - {v}")
    except Exception as e:
        print(f"Error merging: {e}")
else:
    print("""
    Merging profile and meteo data:
    
    # Load both at once
    ds = xsnow.read(['data/profile.pro', 'data/meteo.smet'])
    
    # Or load separately and merge
    ds_pro = xsnow.read('data/profile.pro')
    ds_met = xsnow.read('data/meteo.smet')
    ds_combined = xsnow.merge([ds_pro, ds_met])  # If merge function exists
    """)


## Summary

✅ **What we learned:**

1. **File formats**: .pro (profiles) and .smet (meteorological) files
2. **Loading custom data**: Use `xsnow.read()` with your file paths
3. **Multiple files**: Load lists of files or entire directories
4. **Troubleshooting**: Common issues and solutions
5. **Validation**: Check data quality and ranges
6. **Merging**: Combine profile and meteo data

## Key Tips

- **File paths**: Use absolute paths if relative paths cause issues
- **Format verification**: Inspect file headers to ensure correct format
- **Variable names**: Check xsnow documentation for expected variable names
- **Time alignment**: xsnow handles this automatically when merging
- **Data quality**: Always validate loaded data

## Next Steps

Now that you can load your own data:
- Apply analysis techniques from previous notebooks
- Create visualizations with your data
- Or learn to extend xsnow: **06_extending_xsnow.ipynb**

## Exercises

1. Load one of your own .pro files and inspect its structure
2. Check for missing variables and verify data ranges
3. Load multiple files and compare their time ranges
4. Merge a .pro and .smet file if you have both
5. Validate your data and identify any quality issues
