# YAML Configuration and CLI Usage

This notebook demonstrates:
1. Creating YAML configuration files for pygSQuiG
2. Running simulations with the CLI
3. Configuration validation and best practices
4. Parameter sweeps and batch runs
5. Integration with HPC workflows

## 1. Configuration System Overview

pygSQuiG uses YAML files for configuration, which provides:
- **Human-readable** format
- **Version control friendly** - track parameter changes
- **Reproducibility** - share configs with papers
- **Validation** - catch errors before running
- **Flexibility** - override parameters from CLI

In [None]:
import yaml
import os
import subprocess
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# Create a directory for our configurations
config_dir = Path("example_configs")
config_dir.mkdir(exist_ok=True)

print("pygSQuiG YAML configuration demonstration")
print(f"Working directory: {os.getcwd()}")
print(f"Config directory: {config_dir.absolute()}")

## 2. Basic Configuration Structure

A pygSQuiG configuration file has several main sections:

In [None]:
# Create a basic configuration
basic_config = {
    'simulation': {
        'name': 'basic_sqg_decay',
        'description': 'Basic SQG decaying turbulence example'
    },
    
    'grid': {
        'N': 256,
        'L': 2 * np.pi
    },
    
    'physics': {
        'alpha': 1.0,  # SQG
        'nu_p': 1e-16,
        'p': 8
    },
    
    'timestepping': {
        'dt': 0.001,
        't_final': 10.0,
        'adaptive': False
    },
    
    'initial_condition': {
        'type': 'random',
        'seed': 42,
        'energy': 1.0
    },
    
    'output': {
        'directory': 'output/basic_sqg',
        'save_interval': 0.1,
        'fields': ['theta', 'energy', 'enstrophy'],
        'format': 'netcdf'
    }
}

# Save to YAML file
config_file = config_dir / "basic_sqg.yaml"
with open(config_file, 'w') as f:
    yaml.dump(basic_config, f, default_flow_style=False, sort_keys=False)

print("Created basic configuration:")
print("\n" + "="*50)
with open(config_file, 'r') as f:
    print(f.read())
print("="*50)

## 3. Advanced Configuration Options

Let's create a more complex configuration with forcing and passive scalars:

In [None]:
# Advanced configuration with forcing and scalars
advanced_config = {
    'simulation': {
        'name': 'forced_sqg_with_scalars',
        'description': 'Forced SQG turbulence with passive scalar mixing'
    },
    
    'grid': {
        'N': 512,
        'L': 2 * np.pi
    },
    
    'physics': {
        'alpha': 1.0,
        'nu_p': 1e-16,
        'p': 8
    },
    
    'forcing': {
        'enabled': True,
        'type': 'ring',
        'k_forcing': 5,
        'epsilon': 0.1,
        'correlation_time': 0.1
    },
    
    'damping': {
        'enabled': True,
        'type': 'large_scale',
        'k_damping': 2,
        'damping_rate': 1.0
    },
    
    'passive_scalars': {
        'temperature': {
            'kappa': 1e-3,
            'initial_condition': {
                'type': 'gradient',
                'direction': 'x'
            }
        },
        'tracer': {
            'kappa': 1e-4,
            'initial_condition': {
                'type': 'blob',
                'center': [3.14, 3.14],
                'width': 0.5
            }
        }
    },
    
    'timestepping': {
        'adaptive': True,
        'cfl_number': 0.5,
        'dt_min': 1e-6,
        'dt_max': 0.01,
        't_final': 100.0
    },
    
    'output': {
        'directory': 'output/forced_sqg_scalars',
        'save_interval': 1.0,
        'checkpoint_interval': 10.0,
        'fields': ['theta', 'energy', 'enstrophy', 'temperature', 'tracer'],
        'diagnostics': {
            'spectra': True,
            'fluxes': True,
            'scalar_variance': True
        }
    },
    
    'restart': {
        'enabled': False,
        'checkpoint_file': None
    }
}

# Save advanced configuration
advanced_file = config_dir / "advanced_sqg.yaml"
with open(advanced_file, 'w') as f:
    yaml.dump(advanced_config, f, default_flow_style=False, sort_keys=False)

print("Created advanced configuration with:")
print("  - Ring forcing at k=5")
print("  - Large-scale damping")
print("  - Two passive scalars")
print("  - Adaptive timestepping")
print("  - Checkpoint/restart capability")

## 4. Configuration Validation

pygSQuiG validates configurations before running:

In [None]:
# Example validation function
def validate_config(config):
    """Basic configuration validation."""
    errors = []
    warnings = []
    
    # Check grid
    if 'grid' not in config:
        errors.append("Missing 'grid' section")
    else:
        N = config['grid'].get('N', 0)
        if N <= 0 or (N & (N-1)) != 0:  # Check power of 2
            errors.append(f"Grid size N={N} must be a positive power of 2")
    
    # Check physics
    if 'physics' not in config:
        errors.append("Missing 'physics' section")
    else:
        alpha = config['physics'].get('alpha', -1)
        if not 0 <= alpha <= 2:
            warnings.append(f"Unusual alpha={alpha}, typical range is [0, 2]")
    
    # Check timestepping
    if 'timestepping' in config:
        if config['timestepping'].get('adaptive', False):
            if 'cfl_number' not in config['timestepping']:
                warnings.append("Adaptive timestepping enabled but no CFL number specified")
    
    # Check output
    if 'output' in config:
        interval = config['output'].get('save_interval', 0)
        t_final = config.get('timestepping', {}).get('t_final', 1)
        if interval > t_final / 2:
            warnings.append("Save interval is very large compared to simulation time")
    
    return errors, warnings

# Validate our configurations
for config_name, config_path in [("basic", config_file), ("advanced", advanced_file)]:
    print(f"\nValidating {config_name} configuration:")
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)
    
    errors, warnings = validate_config(config)
    
    if errors:
        print("  ❌ ERRORS:")
        for error in errors:
            print(f"     - {error}")
    else:
        print("  ✓ No errors")
    
    if warnings:
        print("  ⚠️  WARNINGS:")
        for warning in warnings:
            print(f"     - {warning}")
    else:
        print("  ✓ No warnings")

## 5. Using the CLI

The `pygsquig-run` command runs simulations from YAML files:

In [None]:
# Show CLI help (if pygsquig-run is available)
print("Command line usage:")
print("\n" + "="*50)
print("pygsquig-run config.yaml [options]")
print("="*50)
print("\nOptions:")
print("  --dry-run          Validate config without running")
print("  --device gpu       Use GPU acceleration")
print("  --override         Override config parameters")
print("  --profile          Enable profiling")
print("\nExamples:")
print("  pygsquig-run basic_sqg.yaml")
print("  pygsquig-run config.yaml --device gpu")
print("  pygsquig-run config.yaml --override physics.alpha=0.5")
print("  pygsquig-run config.yaml --override timestepping.t_final=20.0")

## 6. Parameter Sweeps

Create multiple configurations for parameter studies:

In [None]:
# Create parameter sweep configurations
sweep_dir = config_dir / "alpha_sweep"
sweep_dir.mkdir(exist_ok=True)

# Base configuration for sweep
base_sweep_config = {
    'simulation': {
        'name': 'alpha_sweep',
        'description': 'Parameter sweep over alpha values'
    },
    'grid': {
        'N': 256,
        'L': 2 * np.pi
    },
    'physics': {
        'nu_p': 1e-16,
        'p': 8
    },
    'timestepping': {
        'dt': 0.001,
        't_final': 20.0
    },
    'initial_condition': {
        'type': 'random',
        'seed': 42,
        'energy': 1.0
    },
    'output': {
        'save_interval': 0.5,
        'fields': ['theta', 'energy', 'spectrum']
    }
}

# Create configs for different alpha values
alpha_values = [0.0, 0.5, 1.0, 1.5, 2.0]
sweep_files = []

for alpha in alpha_values:
    # Copy base config and modify
    config = base_sweep_config.copy()
    config['physics']['alpha'] = alpha
    config['simulation']['name'] = f'alpha_{alpha:.1f}'
    config['output']['directory'] = f'output/alpha_sweep/alpha_{alpha:.1f}'
    
    # Save config
    filename = sweep_dir / f"alpha_{alpha:.1f}.yaml"
    with open(filename, 'w') as f:
        yaml.dump(config, f, default_flow_style=False, sort_keys=False)
    sweep_files.append(filename)

print(f"Created {len(sweep_files)} configuration files for alpha sweep:")
for f in sweep_files:
    print(f"  - {f.name}")

# Create a batch script
batch_script = sweep_dir / "run_sweep.sh"
with open(batch_script, 'w') as f:
    f.write("#!/bin/bash\n")
    f.write("# Batch script for parameter sweep\n\n")
    for config_file in sweep_files:
        f.write(f"echo 'Running {config_file.name}'\n")
        f.write(f"pygsquig-run {config_file}\n")
        f.write("\n")

print(f"\nCreated batch script: {batch_script}")
print("Run with: bash run_sweep.sh")

## 7. HPC Integration

Example configurations for HPC systems:

In [None]:
# HPC-optimized configuration
hpc_config = {
    'simulation': {
        'name': 'large_scale_sqg',
        'description': 'High-resolution SQG for HPC'
    },
    
    'grid': {
        'N': 2048,  # Large grid
        'L': 2 * np.pi
    },
    
    'physics': {
        'alpha': 1.0,
        'nu_p': 1e-20,  # Very low dissipation
        'p': 8
    },
    
    'computation': {
        'device': 'gpu',  # Use GPU
        'precision': 'float64',  # Double precision
        'num_gpus': 1,
        'profile': True  # Enable profiling
    },
    
    'timestepping': {
        'adaptive': True,
        'cfl_number': 0.5,
        'dt_min': 1e-8,
        'dt_max': 0.001,
        't_final': 1000.0  # Long simulation
    },
    
    'output': {
        'directory': '/scratch/username/sqg_hires',  # Scratch filesystem
        'save_interval': 10.0,  # Save less frequently
        'checkpoint_interval': 100.0,  # Regular checkpoints
        'compression': 'gzip',  # Compress output
        'fields': ['theta'],  # Only essential fields
        'diagnostics': {
            'spectra': True,
            'compute_interval': 1.0  # Compute diagnostics more often
        }
    },
    
    'restart': {
        'enabled': True,
        'checkpoint_file': '/scratch/username/sqg_hires/checkpoint_latest.h5'
    }
}

# Save HPC configuration
hpc_file = config_dir / "hpc_sqg.yaml"
with open(hpc_file, 'w') as f:
    yaml.dump(hpc_config, f, default_flow_style=False, sort_keys=False)

# Create SLURM job script
slurm_script = config_dir / "submit_hpc.slurm"
with open(slurm_script, 'w') as f:
    f.write("""#!/bin/bash
#SBATCH --job-name=sqg_hires
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:1
#SBATCH --time=24:00:00
#SBATCH --mem=32GB
#SBATCH --output=sqg_%j.log

# Load modules
module load cuda/11.8
module load python/3.10

# Activate environment
source ~/envs/pygsquig/bin/activate

# Run simulation
pygsquig-run hpc_sqg.yaml --device gpu
""")

print("Created HPC configuration with:")
print("  - 2048×2048 grid")
print("  - GPU acceleration")
print("  - Checkpoint/restart")
print("  - Compressed output")
print(f"\nSLURM script: {slurm_script}")
print("Submit with: sbatch submit_hpc.slurm")

## 8. Best Practices

### Configuration Management:
1. **Version control**: Keep configs in git
2. **Descriptive names**: Use meaningful simulation names
3. **Comments**: YAML supports comments with #
4. **Templates**: Create base configs and modify
5. **Validation**: Always validate before long runs

### Performance Tips:
1. **Output frequency**: Balance detail vs storage
2. **Checkpoint interval**: Regular saves for long runs
3. **Field selection**: Only save what you need
4. **Compression**: Use for large datasets
5. **Scratch storage**: Use fast filesystems on HPC

In [None]:
# Example: Well-documented configuration
documented_config = """
# SQG Turbulence Study Configuration
# Author: Your Name
# Date: 2024-01-15
# Paper: "Energy cascades in SQG turbulence"

simulation:
  name: sqg_cascade_study
  description: |
    Investigating energy cascade in SQG turbulence
    with ring forcing at intermediate scales.
    Parameters chosen to match theoretical predictions.

grid:
  N: 512  # Resolution sufficient for k^{-5/3} range
  L: 6.283185307179586  # 2π domain

physics:
  alpha: 1.0  # Surface QG
  nu_p: 1.0e-16  # Hyperviscosity coefficient
  p: 8  # Order chosen for sharp cutoff

forcing:
  enabled: true
  type: ring
  k_forcing: 10  # Force at intermediate scales
  epsilon: 0.1  # Energy injection rate
  # Ring width automatically set to ±1

timestepping:
  adaptive: true  # CFL-based adaptation
  cfl_number: 0.5  # Conservative for accuracy
  t_final: 1000.0  # Long run for statistics

output:
  directory: output/cascade_study
  save_interval: 10.0  # Snapshots every 10 time units
  diagnostics:
    spectra: true  # Essential for cascade analysis
    fluxes: true  # Energy transfer rates
    compute_interval: 0.1  # High-frequency diagnostics
"""

# Save documented configuration
doc_file = config_dir / "documented_example.yaml"
with open(doc_file, 'w') as f:
    f.write(documented_config)

print("Created well-documented configuration")
print("Key features:")
print("  - Clear documentation and references")
print("  - Explains parameter choices")
print("  - Includes physical reasoning")
print("  - Ready for publication")

## 9. Advanced CLI Usage

### Override parameters from command line:

In [None]:
# Examples of CLI overrides
print("Parameter override examples:\n")

print("1. Change resolution:")
print("   pygsquig-run config.yaml --override grid.N=1024\n")

print("2. Modify physics:")
print("   pygsquig-run config.yaml --override physics.alpha=0.5 --override physics.nu_p=1e-12\n")

print("3. Change output location:")
print("   pygsquig-run config.yaml --override output.directory=/scratch/results\n")

print("4. Quick test run:")
print("   pygsquig-run config.yaml --override timestepping.t_final=1.0 --override output.save_interval=0.1\n")

print("5. Enable profiling:")
print("   pygsquig-run config.yaml --profile --override computation.profile=true\n")

# Create a test script that uses overrides
test_script = config_dir / "test_config.sh"
with open(test_script, 'w') as f:
    f.write("""#!/bin/bash
# Test configuration with different parameters

CONFIG=basic_sqg.yaml

echo "Testing with low resolution..."
pygsquig-run $CONFIG --override grid.N=64 --override timestepping.t_final=1.0 --dry-run

echo "\nTesting with different physics..."
pygsquig-run $CONFIG --override physics.alpha=0.0 --dry-run

echo "\nTesting GPU mode..."
pygsquig-run $CONFIG --override computation.device=gpu --dry-run
""")

print(f"Created test script: {test_script}")

## 10. Configuration Templates

Create reusable templates for common scenarios:

In [None]:
# Template directory
template_dir = config_dir / "templates"
template_dir.mkdir(exist_ok=True)

# Template 1: Decaying turbulence
decay_template = {
    'simulation': {
        'name': '${NAME}',
        'description': 'Decaying turbulence template'
    },
    'grid': {
        'N': '${N:256}',
        'L': 6.283185307179586
    },
    'physics': {
        'alpha': '${ALPHA:1.0}',
        'nu_p': '${NU_P:1e-16}',
        'p': 8
    },
    'timestepping': {
        'dt': 0.001,
        't_final': '${T_FINAL:10.0}'
    },
    'output': {
        'directory': 'output/${NAME}',
        'save_interval': '${SAVE_INTERVAL:0.1}'
    }
}

# Template 2: Forced turbulence
forced_template = {
    'simulation': {
        'name': '${NAME}',
        'description': 'Forced turbulence template'
    },
    'grid': {
        'N': '${N:512}',
        'L': 6.283185307179586
    },
    'forcing': {
        'enabled': True,
        'type': 'ring',
        'k_forcing': '${K_FORCING:10}',
        'epsilon': '${EPSILON:0.1}'
    },
    'damping': {
        'enabled': '${USE_DAMPING:true}',
        'type': 'large_scale',
        'k_damping': 2
    }
}

# Save templates
for name, template in [("decay", decay_template), ("forced", forced_template)]:
    with open(template_dir / f"{name}_template.yaml", 'w') as f:
        yaml.dump(template, f, default_flow_style=False, sort_keys=False)

print("Created configuration templates:")
print("  - decay_template.yaml: For decaying turbulence studies")
print("  - forced_template.yaml: For forced turbulence studies")
print("\nUse environment variables to customize:")
print("  NAME=my_run N=1024 pygsquig-run decay_template.yaml")

## Summary

This notebook demonstrated:

1. **YAML configuration structure** for pygSQuiG
2. **Creating and validating** configuration files
3. **CLI usage** with pygsquig-run
4. **Parameter sweeps** for systematic studies
5. **HPC integration** with job scripts
6. **Best practices** for configuration management
7. **Advanced features** like overrides and templates

### Key Benefits:
- **Reproducibility**: Share configs with papers
- **Flexibility**: Override parameters as needed
- **Scalability**: From laptop to supercomputer
- **Maintainability**: Version control friendly

### Next Steps:
- Create configs for your specific problems
- Set up parameter sweeps
- Integrate with your HPC workflow
- Share configurations with collaborators

In [None]:
# Clean up example files (optional)
print(f"\nCreated example configurations in: {config_dir.absolute()}")
print("\nTo clean up example files, uncomment and run:")
print("# import shutil")
print("# shutil.rmtree(config_dir)")