# 09: Performance and Storage

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Austfi/xsnowForPatrol/blob/main/notebooks/09_performance_and_storage.ipynb)

This notebook covers performance optimization and storage formats for large xsnow datasets.

## What You'll Learn

- Data type optimization (float32 vs float64)
- Memory management strategies
- Zarr format for large datasets
- Chunking and compression strategies

> **Note**: This is a reference notebook covering performance topics. The main tutorial notebooks focus on core functionality.

## Installation (For Colab Users)

If you're using Google Colab, run the cell below to install xsnow and dependencies.

In [None]:
%pip install -q numpy pandas xarray matplotlib seaborn dask netcdf4 zarr
%pip install -q git+https://gitlab.com/avacollabra/postprocessing/xsnow


In [None]:
import xsnow
import numpy as np
import xarray as xr
import zarr

# Load sample data
ds = xsnow.single_profile_timeseries()
print("✅ Data loaded successfully")


## Part 1: Data Type Optimization

For large datasets, optimizing data types can significantly reduce memory usage.

In [None]:
# Check current data types
print("Current data types:")
for var in list(ds.data_vars.keys())[:5]:
    dtype = ds[var].dtype
    size_mb = ds[var].nbytes / (1024**2)
    print(f"  {var}: {dtype} ({size_mb:.2f} MB)")


In [None]:
# Example: Convert float64 to float32 (halves memory usage)
density_vals = ds['density'].values
density_min = density_vals.min()
density_max = density_vals.max()

print(f"Original dtype: {ds['density'].dtype}")
print(f"Value range: {density_min:.1f} to {density_max:.1f} kg/m³")
print(f"Original size: {ds['density'].nbytes / (1024**2):.2f} MB")

# Convert to float32 if values fit
if density_min >= np.finfo(np.float32).min and density_max <= np.finfo(np.float32).max:
    density_float32 = ds['density'].astype('float32')
    print(f"Converted to float32: {density_float32.dtype}")
    print(f"New size: {density_float32.nbytes / (1024**2):.2f} MB")
    print(f"Memory saved: {(ds['density'].nbytes - density_float32.nbytes) / (1024**2):.2f} MB (50% reduction)")


## Part 2: Zarr Format for Large Datasets

Zarr is a format for storing chunked, compressed, N-dimensional arrays, ideal for large snowpack datasets.

In [None]:
# Example: Save dataset to Zarr format
zarr_path = "snowpack_data.zarr"

try:
    ds.to_zarr(
        zarr_path,
        encoding={
            'density': {'compressor': zarr.Blosc(cname='zstd', clevel=3)},
            'temperature': {'compressor': zarr.Blosc(cname='zstd', clevel=3)},
        }
    )
    print(f"✅ Saved to {zarr_path}")
except Exception as e:
    print(f"Note: Zarr save example (may need actual data): {e}")


In [None]:
# Example: Load from Zarr format
try:
    ds_zarr = xr.open_zarr(zarr_path)
    print(f"✅ Loaded from Zarr: {zarr_path}")
    print(f"   Dimensions: {dict(ds_zarr.sizes)}")
except Exception as e:
    print(f"Note: Zarr load example (file may not exist): {e}")


## Summary

✅ **What we learned:**

1. **Data type optimization**: Converting float64 to float32 to save memory
2. **Zarr format**: Chunked, compressed storage for large datasets
3. **Memory management**: Strategies for working with large datasets

## Key Techniques

- **`.astype()`**: Convert data types
- **`.to_zarr()`**: Save to Zarr format
- **`xr.open_zarr()`**: Load from Zarr format

## Next Steps

Return to the main tutorial notebooks to continue learning xsnow.