# YAML Configuration Tutorial

This tutorial shows how to define and execute workflows using YAML configuration files.

## Why YAML Configuration?

YAML configuration provides several benefits:

- **Declarative**: Define what you want, not how to do it
- **Version Control**: Track workflow changes over time
- **Non-Technical Users**: Allow users to modify workflows without coding
- **Template Reuse**: Create reusable workflow templates
- **Validation**: Automatic schema validation of configurations

## Setup

First, let's import the necessary components:

In [None]:
import sys
sys.path.insert(0, '../src')

import yaml
import tempfile
import os
from pathlib import Path

from orchestrator.compiler.yaml_compiler import YAMLCompiler
from orchestrator.compiler.ambiguity_resolver import AmbiguityResolver
from orchestrator.core.model import MockModel, ModelCapabilities
from orchestrator.orchestrator import Orchestrator
from orchestrator.state.state_manager import InMemoryStateManager

print("✅ All components imported successfully!")

## Basic YAML Pipeline Structure

Let's start with a simple YAML pipeline definition:

In [None]:
# Define a simple YAML pipeline
simple_yaml = """
pipeline:
  id: "simple_yaml_pipeline"
  name: "Simple YAML Pipeline"
  description: "A basic pipeline defined in YAML"
  version: "1.0.0"

models:
  - name: "demo-model"
    provider: "mock"
    capabilities:
      supported_tasks: ["generate", "analyze"]
      max_tokens: 1000
      supports_streaming: true

tasks:
  - id: "greeting"
    name: "Generate Greeting"
    action: "generate"
    model: "demo-model"
    parameters:
      prompt: "Generate a friendly greeting for a new user"
      max_tokens: 50
      temperature: 0.7

  - id: "analysis"
    name: "Analyze Greeting"
    action: "analyze"
    model: "demo-model"
    parameters:
      prompt: "Analyze the tone and style of this greeting: {greeting.result}"
      analysis_type: "tone"
    dependencies:
      - "greeting"
"""

print("📄 Simple YAML pipeline defined:")
print(simple_yaml)

## Compiling YAML to Pipeline

Now let's compile this YAML into an executable pipeline:

In [None]:
# Create compiler and resolver
resolver = AmbiguityResolver()
compiler = YAMLCompiler(resolver=resolver)

# Parse the YAML
config = yaml.safe_load(simple_yaml)

# Compile to pipeline and models
pipeline, models = compiler.compile(config)

print(f"✅ Compiled successfully!")
print(f"📊 Pipeline: {pipeline.name} with {len(pipeline)} tasks")
print(f"🤖 Models: {len(models)} model(s) created")
print(f"📋 Tasks: {list(pipeline)}")

# Show execution order
execution_order = pipeline.get_execution_order()
print(f"⚡ Execution order: {execution_order}")

## Setting Up Mock Responses

For demonstration, let's configure responses for our mock model:

In [None]:
# Find our mock model and set up responses
demo_model = None
for model in models:
    if model.name == "demo-model":
        demo_model = model
        break

if demo_model and hasattr(demo_model, 'set_response'):
    demo_model.set_response(
        "Generate a friendly greeting for a new user",
        "Welcome! 🎉 We're delighted to have you join our community. "
        "Feel free to explore and don't hesitate to ask if you need any help!"
    )
    
    demo_model.set_response(
        "Analyze the tone and style of this greeting",
        "This greeting has a warm, welcoming tone with an enthusiastic and supportive style. "
        "It uses inclusive language ('our community') and encouraging phrases that make "
        "the new user feel valued and supported."
    )
    
    print("✅ Mock responses configured")
else:
    print("⚠️ Could not configure mock responses")

## Executing the YAML Pipeline

Now let's create an orchestrator and execute our YAML-defined pipeline:

In [None]:
async def execute_yaml_pipeline():
    """Execute the YAML-defined pipeline."""
    # Create orchestrator
    state_manager = InMemoryStateManager()
    orchestrator = Orchestrator(state_manager=state_manager)
    
    # Register all models
    for model in models:
        orchestrator.register_model(model)
    
    print(f"🚀 Executing YAML pipeline: {pipeline.name}")
    print(f"📊 Initial progress: {pipeline.get_progress()}")
    
    # Execute pipeline
    result = await orchestrator.execute_pipeline(pipeline)
    
    print(f"\n✅ Pipeline execution completed: {result}")
    print(f"📊 Final progress: {pipeline.get_progress()}")
    
    # Show task results
    print("\n📋 Task Results:")
    for task_id in pipeline:
        task = pipeline.get_task(task_id)
        print(f"  🔸 {task.name}: {task.status}")
        if task.result:
            result_preview = task.result[:100] + "..." if len(task.result) > 100 else task.result
            print(f"      Result: {result_preview}")
    
    return result

# Execute the pipeline
result = await execute_yaml_pipeline()

## Advanced YAML Features

### 1. Template Variables and AUTO Resolution

The framework supports automatic resolution of template variables:

In [None]:
# YAML with template variables
template_yaml = """
pipeline:
  id: "template_pipeline"
  name: "Template Pipeline"
  description: "Demonstrates template variable resolution"

models:
  - name: "smart-model"
    provider: "mock"
    capabilities:
      supported_tasks: ["generate", "analyze", "summarize"]
      max_tokens: 2000

variables:
  user_name: "Alice"
  topic: "artificial intelligence"
  max_words: 150

tasks:
  - id: "personalized_content"
    name: "Generate Personalized Content"
    action: "generate"
    model: "smart-model"
    parameters:
      prompt: "Write a personalized introduction to {{topic}} for {{user_name}}"
      max_tokens: "{{max_words}}"
      temperature: 0.8

  - id: "quality_check"
    name: "Check Content Quality"
    action: "analyze"
    model: "smart-model"
    parameters:
      prompt: "Rate the quality and clarity of this content for {{user_name}}: {personalized_content.result}"
      analysis_type: "quality"
    dependencies:
      - "personalized_content"

  - id: "final_summary"
    name: "Create Summary"
    action: "summarize"
    model: "smart-model"
    parameters:
      prompt: "Summarize this analysis: {quality_check.result}"
    dependencies:
      - "quality_check"
"""

print("📄 Template YAML with variables defined:")
print(template_yaml)

### 2. Compiling Template Pipeline

In [None]:
# Compile the template pipeline
template_config = yaml.safe_load(template_yaml)
template_pipeline, template_models = compiler.compile(template_config)

print(f"✅ Template pipeline compiled: {template_pipeline.name}")
print(f"📊 Tasks: {len(template_pipeline)}")
print(f"🤖 Models: {len(template_models)}")

# Examine how variables were resolved
for task_id in template_pipeline:
    task = template_pipeline.get_task(task_id)
    print(f"\n🔸 Task: {task.name}")
    print(f"   Parameters: {task.parameters}")

### 3. Complex Pipeline with Conditional Logic

Let's create a more complex pipeline that demonstrates advanced features:

In [None]:
# Complex YAML pipeline
complex_yaml = """
pipeline:
  id: "content_workflow"
  name: "Content Creation Workflow"
  description: "Multi-stage content creation and review process"
  version: "2.1.0"
  metadata:
    author: "AI Orchestrator"
    category: "content"
    priority: "high"

models:
  - name: "creative-writer"
    provider: "mock"
    capabilities:
      supported_tasks: ["generate", "edit"]
      max_tokens: 2000
      supports_streaming: true
      languages: ["en", "es", "fr"]

  - name: "quality-reviewer"
    provider: "mock"
    capabilities:
      supported_tasks: ["analyze", "review"]
      max_tokens: 1000
      supports_structured_output: true

variables:
  content_type: "blog_post"
  target_audience: "software_developers"
  word_count: 500
  language: "en"

tasks:
  # Stage 1: Content Generation
  - id: "outline"
    name: "Create Content Outline"
    action: "generate"
    model: "creative-writer"
    parameters:
      prompt: "Create an outline for a {{content_type}} targeting {{target_audience}}"
      max_tokens: 200
      temperature: 0.7
    metadata:
      stage: "planning"
      priority: 1

  - id: "draft"
    name: "Write First Draft"
    action: "generate"
    model: "creative-writer"
    parameters:
      prompt: "Write a {{word_count}}-word {{content_type}} based on this outline: {outline.result}"
      max_tokens: "{{word_count}}"
      temperature: 0.8
      language: "{{language}}"
    dependencies:
      - "outline"
    metadata:
      stage: "creation"
      priority: 2

  # Stage 2: Quality Review
  - id: "grammar_check"
    name: "Grammar and Style Check"
    action: "analyze"
    model: "quality-reviewer"
    parameters:
      prompt: "Check grammar, style, and readability of this content: {draft.result}"
      analysis_type: "grammar"
    dependencies:
      - "draft"
    metadata:
      stage: "review"
      priority: 3

  - id: "audience_fit"
    name: "Audience Fit Analysis"
    action: "analyze"
    model: "quality-reviewer"
    parameters:
      prompt: "Analyze if this content fits the target audience {{target_audience}}: {draft.result}"
      analysis_type: "audience"
    dependencies:
      - "draft"
    metadata:
      stage: "review"
      priority: 3

  # Stage 3: Final Processing
  - id: "final_edit"
    name: "Apply Final Edits"
    action: "edit"
    model: "creative-writer"
    parameters:
      prompt: "Apply these improvements to the content:\nGrammar: {grammar_check.result}\nAudience: {audience_fit.result}\nOriginal: {draft.result}"
      max_tokens: "{{word_count}}"
    dependencies:
      - "grammar_check"
      - "audience_fit"
    metadata:
      stage: "finalization"
      priority: 4

  - id: "quality_score"
    name: "Final Quality Assessment"
    action: "review"
    model: "quality-reviewer"
    parameters:
      prompt: "Provide a quality score (1-10) and brief assessment for this final content: {final_edit.result}"
      output_format: "structured"
    dependencies:
      - "final_edit"
    metadata:
      stage: "assessment"
      priority: 5
"""

print("📄 Complex YAML pipeline defined with multiple stages and models")
print(f"📏 YAML length: {len(complex_yaml.split())} words")

### 4. Compiling and Analyzing Complex Pipeline

In [None]:
# Compile the complex pipeline
complex_config = yaml.safe_load(complex_yaml)
complex_pipeline, complex_models = compiler.compile(complex_config)

print(f"✅ Complex pipeline compiled: {complex_pipeline.name}")
print(f"📊 Pipeline info:")
print(f"   Tasks: {len(complex_pipeline)}")
print(f"   Models: {len(complex_models)}")
print(f"   Version: {complex_pipeline.version}")
print(f"   Description: {complex_pipeline.description}")

# Analyze execution order and critical path
execution_order = complex_pipeline.get_execution_order()
critical_path = complex_pipeline.get_critical_path()

print(f"\n⚡ Execution Analysis:")
print(f"   Execution levels: {len(execution_order)}")
for i, level in enumerate(execution_order):
    print(f"     Level {i+1}: {level}")
print(f"   Critical path: {' → '.join(critical_path)}")

# Show model assignments
print(f"\n🤖 Model Assignments:")
for task_id in complex_pipeline:
    task = complex_pipeline.get_task(task_id)
    model_name = task.parameters.get('model', 'No model specified')
    stage = task.metadata.get('stage', 'unknown')
    print(f"   {task.name}: {model_name} (stage: {stage})")

## YAML Validation and Error Handling

The framework includes comprehensive validation for YAML configurations:

In [None]:
# Example of invalid YAML that would be caught
invalid_yaml = """
pipeline:
  # Missing required 'id' field
  name: "Invalid Pipeline"

tasks:
  - id: "task1"
    name: "Task 1"
    # Missing required 'action' field
    parameters:
      prompt: "Test"
    dependencies:
      - "nonexistent_task"  # Invalid dependency
"""

print("🚫 Example of invalid YAML (would cause validation errors):")
print("   - Missing pipeline.id")
print("   - Missing task.action")
print("   - Invalid dependency reference")
print("   - No models defined")

# The compiler would catch these errors during compilation
try:
    invalid_config = yaml.safe_load(invalid_yaml)
    # This would raise validation errors:
    # invalid_pipeline, invalid_models = compiler.compile(invalid_config)
    print("\n✅ Validation prevents invalid configurations from being executed")
except Exception as e:
    print(f"\n❌ Validation error (expected): {e}")

## Saving and Loading YAML Configurations

YAML configurations can be saved to files and loaded later:

In [None]:
# Save pipeline configuration to file
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
    f.write(template_yaml)
    yaml_file_path = f.name

print(f"💾 YAML configuration saved to: {yaml_file_path}")

# Load configuration from file
with open(yaml_file_path, 'r') as f:
    loaded_config = yaml.safe_load(f)

# Compile loaded configuration
loaded_pipeline, loaded_models = compiler.compile(loaded_config)

print(f"📂 Loaded pipeline: {loaded_pipeline.name}")
print(f"   Tasks: {len(loaded_pipeline)}")
print(f"   Variables resolved: {loaded_config.get('variables', {})}")

# Clean up
os.unlink(yaml_file_path)
print(f"🗑️ Temporary file cleaned up")

## Best Practices for YAML Pipelines

Here are some best practices when creating YAML pipelines:

In [None]:
# Best practices example
best_practices_yaml = """
# 1. Always include comprehensive metadata
pipeline:
  id: "best_practices_pipeline"
  name: "Best Practices Demo"
  description: "Demonstrates YAML pipeline best practices"
  version: "1.0.0"
  metadata:
    author: "Your Name"
    created: "2024-01-01"
    tags: ["demo", "best-practices"]
    documentation: "https://docs.example.com/pipelines/best-practices"

# 2. Define reusable variables at the top
variables:
  model_name: "production-model"
  max_tokens: 1000
  temperature: 0.7
  retry_count: 3

# 3. Clearly separate models by purpose
models:
  - name: "{{model_name}}"
    provider: "openai"
    capabilities:
      supported_tasks: ["generate"]
      max_tokens: "{{max_tokens}}"
    requirements:
      api_key_required: true

# 4. Use clear, descriptive task names and organize by stages
tasks:
  # Input Stage
  - id: "validate_input"
    name: "Validate User Input"
    action: "generate"
    model: "{{model_name}}"
    parameters:
      prompt: "Validate this input: {{user_input}}"
      max_tokens: 100
      temperature: 0.1  # Low temperature for validation
    metadata:
      stage: "input"
      description: "Ensures input meets quality standards"
      timeout: 30
      max_retries: "{{retry_count}}"

  # Processing Stage
  - id: "process_content"
    name: "Process User Content"
    action: "generate"
    model: "{{model_name}}"
    parameters:
      prompt: "Process this validated content: {validate_input.result}"
      max_tokens: "{{max_tokens}}"
      temperature: "{{temperature}}"
    dependencies:
      - "validate_input"
    metadata:
      stage: "processing"
      description: "Main content processing logic"
      timeout: 120
      max_retries: "{{retry_count}}"

  # Output Stage
  - id: "format_output"
    name: "Format Final Output"
    action: "generate"
    model: "{{model_name}}"
    parameters:
      prompt: "Format this content for presentation: {process_content.result}"
      max_tokens: 200
      temperature: 0.3  # Lower temperature for formatting
    dependencies:
      - "process_content"
    metadata:
      stage: "output"
      description: "Formats content for end-user consumption"
      timeout: 60
      max_retries: "{{retry_count}}"
"""

print("✅ Best Practices YAML Structure:")
print("   📋 Comprehensive metadata and documentation")
print("   🔧 Reusable variables for configuration")
print("   🏗️ Clear stage-based organization")
print("   📝 Descriptive names and comments")
print("   ⚙️ Appropriate parameters for each task type")
print("   🔄 Retry and timeout configurations")
print("   🎯 Temperature tuning based on task purpose")

## Summary

In this tutorial, you learned:

1. **YAML Structure**: How to define pipelines, models, and tasks in YAML
2. **Variable Templates**: Using `{{variable}}` syntax for reusable configurations
3. **Compilation Process**: Converting YAML to executable pipelines
4. **Advanced Features**: Multi-stage pipelines with complex dependencies
5. **Validation**: Automatic error detection and prevention
6. **Best Practices**: Organizing and documenting YAML configurations

## Benefits of YAML Configuration

- **🔄 Reusability**: Create template workflows for different scenarios
- **📖 Readability**: Clear, human-readable workflow definitions
- **🔧 Maintainability**: Easy to modify workflows without code changes
- **✅ Validation**: Automatic checking prevents configuration errors
- **📚 Documentation**: Self-documenting workflow definitions
- **🎯 Flexibility**: Support for variables and template resolution

## Next Steps

- Explore **Advanced Model Integration** for real AI providers
- Learn about **Error Handling and Recovery** in YAML pipelines
- Try **Production Deployment** with YAML configurations
- Check out **Schema Validation** for custom validation rules

---

**Happy configuring! 📝🎵**