# 05: Working with Custom Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/05_working_with_custom_data.ipynb)

This notebook shows you how to prepare and load your own SNOWPACK output files into xsnow.

## What You'll Learn

- Preparing your own .pro and .smet files
- File format requirements
- Loading custom data
- Troubleshooting common issues
- Merging multiple data sources
- Data validation


### Learning objectives
- Review xsnow file expectations so your custom data loads cleanly.
- Practice validating file paths, headers, and required variables.
- Load one or many files with `xsnow.read` while trapping common errors.
- Merge profile and meteorological datasets to build a richer context.

**Prerequisites**
- [ ] Comfortable with notebooks 01–04.
- [ ] Basic familiarity with filesystem paths.
- [ ] Ability to interpret snow profile variables in xsnow.


## Installation (For Colab Users)
**Show.** Install xsnow and scientific dependencies when running remotely.


In [None]:
# Run.
%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow


## Setup: Reference Dataset
**Show.** Load the sample dataset so you always have a known-good structure for validation.


In [None]:
# Run.
import xsnow
import numpy as np
from pathlib import Path

print('Loading xsnow reference dataset...')
reference_ds = xsnow.single_profile_timeseries()
print('✅ Reference dataset dims:', dict(reference_ds.dims))


**Explain.** A working dataset gives you something to compare against when debugging custom files.


In [None]:
# Check for understanding: reference dataset
assert reference_ds is not None
assert 'location' in reference_ds.dims


## Part 1: File Format Requirements
**Show.** Summarize the suffixes and metadata xsnow expects.


In [None]:
# Run.
required_suffixes = {'.pro', '.smet'}
print('Accepted profile suffixes:', required_suffixes)
print('Required coordinate names:', ['time', 'location', 'layer'])


**Explain.** Confirming file types early prevents silent parsing failures later.


In [None]:
# Check for understanding: suffix set
assert '.pro' in required_suffixes
assert '.smet' in required_suffixes


## Part 2: Preparing Your Files
**Show.** Check directories and filenames before attempting to load them.


In [None]:
# Run.
data_dir = Path('data')
pro_files = sorted(data_dir.glob('*.pro'))
smet_files = sorted(data_dir.glob('*.smet'))
print('Found profile files:', [p.name for p in pro_files])
print('Found meteo files:', [p.name for p in smet_files])


**Explain.** Verifying file presence and extension catches typos before they trigger stack traces.


In [None]:
# Check for understanding: file listing
assert isinstance(pro_files, list)
assert isinstance(smet_files, list)


## Part 3: Loading Your Custom Data
**Show.** Use `xsnow.read` to open single files or collections.


In [None]:
# Run.
example_single = data_dir / 'example.pro'
example_many = [data_dir / 'station1.pro', data_dir / 'station2.pro']
print('Single-file pattern:', example_single)
print('Multi-file pattern:', example_many)
# For demonstration, reuse the reference dataset
custom_ds = reference_ds
print('Placeholder dataset dimensions:', dict(custom_ds.dims))


**Explain.** Passing either a path or list gives xsnow the context it needs to assemble a dataset.


In [None]:
# Check for understanding: dataset placeholder
assert custom_ds is reference_ds
assert isinstance(custom_ds.dims, dict)


### Loading Multiple Files
**Show.** Build file lists programmatically to avoid manual errors.


In [None]:
# Run.
station_codes = ['station1', 'station2', 'station3']
pattern = [data_dir / f'{code}.pro' for code in station_codes]
print('Pattern list:', pattern)


**Explain.** Generating lists keeps your workflow reproducible and shareable.


In [None]:
# Check for understanding: pattern list
assert len(pattern) == 3
assert all(p.suffix == '.pro' for p in pattern)


## Part 4: Troubleshooting Common Issues
**Show.** Guard against missing files and unexpected headers.


In [None]:
# Run.
def file_exists(path: Path) -> bool:
    exists = path.exists()
    if not exists:
        print(f'⚠️ Missing file: {path}')
    return exists

example_path = data_dir / 'station1.pro'
file_exists(example_path)


**Explain.** Checking for existence first avoids confusing stack traces from deep within xsnow.


In [None]:
# Check for understanding: existence helper
assert callable(file_exists)
assert isinstance(file_exists(example_path), bool)


### Inspecting Headers
**Show.** Peek at the first few lines to confirm formatting.


In [None]:
# Run.
try:
    header_preview = example_path.read_text().splitlines()[:5]
except FileNotFoundError:
    header_preview = ['# Example SMET header', 'fields = time HS TA']
print('Header preview:', header_preview)


**Explain.** A quick glance reveals whether required fields like `time` or `HS` are present.


In [None]:
# Check for understanding: header preview
assert isinstance(header_preview, list)
assert len(header_preview) > 0


### Checking Variables
**Show.** Confirm that required variables made it into the dataset.


In [None]:
# Run.
required_vars = {'HS', 'density'}
missing = [var for var in required_vars if var not in custom_ds.data_vars]
print('Missing variables:', missing)


**Explain.** Comparing against a required set keeps you from analyzing incomplete data.


In [None]:
# Check for understanding: variable coverage
assert isinstance(missing, list)
assert required_vars == {'HS', 'density'}


### Time Alignment
**Show.** Verify that timestamps are monotonic and unique.


In [None]:
# Run.
times = custom_ds['time'].values
is_sorted = (times[1:] >= times[:-1]).all() if len(times) > 1 else True
print('Times monotonic:', bool(is_sorted))


**Explain.** Misordered timestamps can break merges and rolling calculations.


In [None]:
# Check for understanding: time monotonicity
assert isinstance(is_sorted, (bool, np.bool_))


## Part 5: Data Validation
**Show.** Run lightweight checks before trusting analysis outputs.


In [None]:
# Run.
def validate_dataset(ds):
    issues = []
    if ds.isnull().any():
        issues.append('Contains NaNs')
    if (ds['HS'] < 0).any():
        issues.append('Negative snow height detected')
    return issues

validation_messages = validate_dataset(custom_ds)
print('Validation issues:', validation_messages or 'None detected')


**Explain.** Automated checks keep datasets healthy as new files roll in.


In [None]:
# Check for understanding: validation helper
assert callable(validate_dataset)
assert isinstance(validation_messages, list)


## Part 6: Merging Profile and Meteorological Data
**Show.** Combine complementary datasets for context.


In [None]:
# Run.
profile_subset = reference_ds[['HS', 'density']]
meteo_like = reference_ds[['TA']] if 'TA' in reference_ds.data_vars else reference_ds[['HS']]
merged = profile_subset.merge(meteo_like, join='inner')
print('Merged variables:', list(merged.data_vars))


**Explain.** Merging ensures snow structure and weather live in the same dataset for downstream models.


In [None]:
# Check for understanding: merge result
assert set(profile_subset.dims).issuperset(set(merged.dims))


### Play
Experiment with different glob patterns or required variable sets to match your project.


In [None]:
# Run.
glob_pattern = '*.pro'  # Try '*.smet' or '**/*.pro'
required_vars_play = {'HS', 'temperature'}
files = sorted(data_dir.glob(glob_pattern))
print('Matched files:', [f.name for f in files])
print('Required vars to check:', required_vars_play)


## Practice
Challenge yourself before reading the answers.


1. Write a helper that validates coordinate names before calling `xsnow.read`.
2. Create a summary table counting how many files each location contributes.
3. Draft an error message template for missing required variables.


<details>
<summary>Solutions</summary>

1. Check `path.name` against expected tokens before calling `xsnow.read`.
2. After loading, use `ds.groupby('location').size()` to count profiles per site.
3. Use something like `f"Required variables missing: {missing}"` to guide the next steps.

</details>


## Summary
- Validate filenames, headers, and variables before loading custom xsnow data.
- `xsnow.read` handles single or multiple files when paths are organized.
- Small helper functions catch issues early and streamline future ingests.
