# MOM6 Time Averaging with CrocoCamp

This notebook demonstrates how to use CrocoCamp's time averaging functionality to perform configurable temporal resampling of MOM6 NetCDF output files.

## Overview

The time averaging tool supports:
- **Monthly averaging**: One output file per month
- **Seasonal averaging**: One output file per season (DJF, MAM, JJA, SON)
- **Yearly averaging**: One output file per year
- **Rolling averages**: Single output file with rolling mean time series
- **Custom frequency**: Flexible resampling with pandas frequency strings

All outputs maintain compatibility with CrocoCamp's `WorkflowModelObs` workflow.

In [None]:
import os
import tempfile
import numpy as np
import pandas as pd
import xarray as xr
import yaml
from pathlib import Path

# Import CrocoCamp time averaging functionality
from crococamp.io.time_averaging import MOM6TimeAverager, time_average_from_config

## 1. Create Sample MOM6 Data

First, let's create some mock MOM6 data for demonstration purposes.

In [None]:
def create_sample_mom6_data(start_date='2010-01-01', end_date='2010-12-31', freq='D'):
    """Create sample MOM6-like dataset."""
    
    # Generate time series
    times = pd.date_range(start_date, end_date, freq=freq)
    
    # Create spatial grid (simplified MOM6-like structure)
    xh = np.linspace(-80, -60, 20)  # Longitude 
    yh = np.linspace(30, 45, 15)    # Latitude
    zl = np.array([5, 15, 25, 50, 100, 200])  # Depth levels
    
    nt, nz, ny, nx = len(times), len(zl), len(yh), len(xh)
    
    # Create realistic-looking temperature data with seasonal cycle
    day_of_year = np.array([t.dayofyear for t in times])
    seasonal_cycle = 5 * np.sin(2 * np.pi * day_of_year / 365.25)
    
    # Base temperature field
    temp_base = 15.0 + seasonal_cycle[:, None, None, None]
    temp_noise = np.random.randn(nt, nz, ny, nx) * 0.5
    temperature = temp_base + temp_noise
    
    # Create salinity with smaller variations
    salinity = 35.0 + np.random.randn(nt, nz, ny, nx) * 0.1
    
    # Surface height with some variability
    ssh = np.random.randn(nt, ny, nx) * 0.2
    
    # Create dataset
    dataset = xr.Dataset(
        {
            'thetao': (('time', 'zl', 'yh', 'xh'), temperature),
            'so': (('time', 'zl', 'yh', 'xh'), salinity),
            'SSH': (('time', 'yh', 'xh'), ssh),
            'tos': (('time', 'yh', 'xh'), temperature[:, 0, :, :]),  # Surface temp
            'average_DT': (('time',), np.full(nt, 1.0)),  # 1 day intervals
        },
        coords={
            'time': times,
            'xh': xh,
            'yh': yh,
            'zl': zl,
        }
    )
    
    # Add attributes similar to MOM6
    dataset.attrs.update({
        'title': 'Sample MOM6 dataset for CrocoCamp demo',
        'grid_type': 'regular'
    })
    
    dataset['thetao'].attrs.update({
        'units': 'degC',
        'long_name': 'Sea Water Potential Temperature',
        'standard_name': 'sea_water_potential_temperature'
    })
    
    dataset['so'].attrs.update({
        'units': 'psu',
        'long_name': 'Sea Water Salinity',
        'standard_name': 'sea_water_salinity'
    })
    
    return dataset

# Create sample data
sample_data = create_sample_mom6_data()
print(f"Created sample dataset with shape: {sample_data.dims}")
print(f"Time range: {sample_data.time.values[0]} to {sample_data.time.values[-1]}")
print(f"Variables: {list(sample_data.data_vars.keys())}")

## 2. Monthly Averaging Example

Let's demonstrate monthly averaging, which creates one output file per month.

In [None]:
# Create working directory
work_dir = Path('time_averaging_demo')
work_dir.mkdir(exist_ok=True)

# Save sample data
input_file = work_dir / 'sample_mom6_data.nc'
sample_data.to_netcdf(input_file)

# Create monthly averaging configuration
monthly_config = {
    'input_files_pattern': str(input_file),
    'output_directory': str(work_dir / 'monthly_output'),
    'averaging_window': 'monthly',
    'variables': ['thetao', 'so', 'SSH']  # Subset for faster processing
}

config_file = work_dir / 'monthly_config.yaml'
with open(config_file, 'w') as f:
    yaml.dump(monthly_config, f)

print(f"Configuration saved to: {config_file}")
print("\nConfiguration contents:")
print(yaml.dump(monthly_config, default_flow_style=False))

In [None]:
# Run monthly averaging
print("Running monthly averaging...")
monthly_files = time_average_from_config(str(config_file))

print(f"\nCreated {len(monthly_files)} monthly files:")
for f in sorted(monthly_files):
    print(f"  - {os.path.basename(f)}")

# Examine one of the monthly files
sample_monthly = xr.open_dataset(monthly_files[0])
print(f"\nSample monthly file structure:")
print(f"Variables: {list(sample_monthly.data_vars.keys())}")
print(f"Time coordinate: {sample_monthly.time.values}")
print(f"Temperature range: {sample_monthly.thetao.min().values:.2f} to {sample_monthly.thetao.max().values:.2f} °C")
sample_monthly.close()

## 3. Seasonal Averaging Example

Now let's create seasonal averages (DJF, MAM, JJA, SON).

In [None]:
# Create seasonal averaging configuration
seasonal_config = {
    'input_files_pattern': str(input_file),
    'output_directory': str(work_dir / 'seasonal_output'),
    'averaging_window': 'seasonal',
    'variables': ['thetao', 'tos']  # Focus on temperature variables
}

seasonal_config_file = work_dir / 'seasonal_config.yaml'
with open(seasonal_config_file, 'w') as f:
    yaml.dump(seasonal_config, f)

# Run seasonal averaging
print("Running seasonal averaging...")
seasonal_files = time_average_from_config(str(seasonal_config_file))

print(f"\nCreated {len(seasonal_files)} seasonal files:")
for f in sorted(seasonal_files):
    print(f"  - {os.path.basename(f)}")

## 4. Rolling Average Example

Rolling averages create a smoothed time series in a single output file.

In [None]:
# Create rolling average configuration (30-day window)
rolling_config = {
    'input_files_pattern': str(input_file),
    'output_directory': str(work_dir / 'rolling_output'),
    'averaging_window': {
        'type': 'rolling',
        'window_size': '30D',
        'center': True
    },
    'variables': ['SSH', 'tos']  # Fast variables for demo
}

rolling_config_file = work_dir / 'rolling_config.yaml'
with open(rolling_config_file, 'w') as f:
    yaml.dump(rolling_config, f)

# Run rolling averaging
print("Running rolling average (30-day window)...")
rolling_files = time_average_from_config(str(rolling_config_file))

print(f"\nCreated {len(rolling_files)} rolling average file:")
for f in rolling_files:
    print(f"  - {os.path.basename(f)}")

# Examine the rolling average file
rolling_data = xr.open_dataset(rolling_files[0])
print(f"\nRolling average file structure:")
print(f"Time points: {len(rolling_data.time)}")
print(f"Time range: {rolling_data.time.values[0]} to {rolling_data.time.values[-1]}")
rolling_data.close()

## 5. Custom Frequency Example

For more flexibility, you can specify custom resampling frequencies using pandas frequency strings.

In [None]:
# Create custom frequency configuration (10-day averages)
custom_config = {
    'input_files_pattern': str(input_file),
    'output_directory': str(work_dir / 'custom_output'),
    'averaging_window': {
        'type': 'custom',
        'freq': '10D'  # 10-day periods
    },
    'variables': ['tos']  # Just surface temperature for speed
}

custom_config_file = work_dir / 'custom_config.yaml'
with open(custom_config_file, 'w') as f:
    yaml.dump(custom_config, f)

# Run custom averaging
print("Running custom frequency averaging (10-day periods)...")
custom_files = time_average_from_config(str(custom_config_file))

print(f"\nCreated {len(custom_files)} 10-day average files:")
for f in sorted(custom_files)[:5]:  # Show first 5
    print(f"  - {os.path.basename(f)}")
if len(custom_files) > 5:
    print(f"  ... and {len(custom_files) - 5} more")

## 6. Using the Command Line Interface

The time averaging functionality can also be used from the command line:

In [None]:
# Display the CLI help
import subprocess

# Show main CLI help
result = subprocess.run(['python', '-m', 'crococamp.cli.crococamp_cli', '--help'], 
                       capture_output=True, text=True)
print("Main CLI help:")
print(result.stdout)

# Show time-average specific help  
result = subprocess.run(['python', '-m', 'crococamp.cli.crococamp_cli', 'time-average', '--help'],
                       capture_output=True, text=True)
print("\nTime averaging CLI help:")
print(result.stdout)

In [None]:
# Example of running via CLI
cli_output_dir = work_dir / 'cli_output'
cli_config = {
    'input_files_pattern': str(input_file),
    'output_directory': str(cli_output_dir),
    'averaging_window': 'monthly',
    'variables': ['tos']
}

cli_config_file = work_dir / 'cli_config.yaml'
with open(cli_config_file, 'w') as f:
    yaml.dump(cli_config, f)

# Run via CLI
print("Running time averaging via CLI...")
result = subprocess.run(['python', '-m', 'crococamp.cli.crococamp_cli', 'time-average', str(cli_config_file)],
                       capture_output=True, text=True, cwd='.')

print("CLI output:")
print(result.stdout)
if result.stderr:
    print("CLI errors:")
    print(result.stderr)

# Check created files
if cli_output_dir.exists():
    cli_files = list(cli_output_dir.glob('*.nc'))
    print(f"\nCLI created {len(cli_files)} files:")
    for f in sorted(cli_files):
        print(f"  - {f.name}")

## 7. Workflow Compatibility

All output files from the time averaging tool are designed to be compatible with CrocoCamp's `WorkflowModelObs` workflow. Let's verify this by checking the structure of our output files.

In [None]:
# Check compatibility of output files
def check_mom6_compatibility(filepath):
    """Check if file structure is compatible with MOM6/CrocoCamp workflows."""
    
    with xr.open_dataset(filepath) as ds:
        checks = {
            'has_time_coord': 'time' in ds.coords,
            'has_spatial_coords': all(coord in ds.coords for coord in ['xh', 'yh']),
            'time_is_array': hasattr(ds.time, '__len__') and len(ds.time) >= 1,
            'has_mom6_variables': any(var in ds.variables for var in ['thetao', 'so', 'SSH', 'tos']),
            'preserves_attributes': bool(ds.attrs),
        }
        
        # Check variable attributes
        if 'thetao' in ds.variables:
            checks['temp_has_units'] = 'units' in ds['thetao'].attrs
            checks['temp_units_correct'] = ds['thetao'].attrs.get('units') == 'degC'
        
        return checks

# Test compatibility of different output types
test_files = {
    'Monthly': monthly_files[0] if monthly_files else None,
    'Seasonal': seasonal_files[0] if seasonal_files else None,
    'Rolling': rolling_files[0] if rolling_files else None,
}

print("Compatibility check results:")
print("=============================")

for name, filepath in test_files.items():
    if filepath and os.path.exists(filepath):
        print(f"\n{name} file: {os.path.basename(filepath)}")
        checks = check_mom6_compatibility(filepath)
        for check, result in checks.items():
            status = "✓" if result else "✗"
            print(f"  {status} {check}: {result}")
    else:
        print(f"\n{name} file: Not available")

## 8. Summary

This notebook demonstrated the key features of CrocoCamp's MOM6 time averaging tool:

1. **Monthly Averaging**: Creates separate files for each month
2. **Seasonal Averaging**: Creates files for meteorological seasons
3. **Rolling Averages**: Creates smoothed time series in a single file
4. **Custom Frequencies**: Flexible resampling with pandas frequency strings
5. **CLI Usage**: Command-line interface for batch processing
6. **Workflow Compatibility**: Output files work with existing CrocoCamp workflows

### Key Benefits:

- **Scalable**: Uses dask for memory-efficient processing of large datasets
- **Configurable**: YAML-based configuration for reproducible workflows
- **Compatible**: Maintains MOM6 variable naming and attributes
- **Extensible**: Designed for easy extension to other ocean models

### Next Steps:

1. Apply to your own MOM6 output files
2. Experiment with different averaging windows
3. Use averaged files in CrocoCamp's `WorkflowModelObs` for model-observation comparisons
4. Combine with CrocoCamp's visualization tools for analysis

In [None]:
# Clean up demo files (optional)
import shutil

# Uncomment to clean up:
# shutil.rmtree(work_dir)
# print(f"Cleaned up demo directory: {work_dir}")

print(f"Demo files are available in: {work_dir}")
print("\nTo clean up, run: shutil.rmtree('time_averaging_demo')")