# Unity ML-Agents Complete Training Guide

## Comprehensive Step-by-Step Tutorial for Training AI Agents with ONNX Export

This notebook provides a complete guide for creating, training, and deploying ML-Agents in Unity with ONNX model export. Each section includes detailed explanations, code examples, and best practices.

### What You'll Learn:
- Complete Unity ML-Agents setup and installation
- Creating and configuring Unity scenes for training
- Writing C# agent scripts
- Training configuration and hyperparameter tuning
- Model training and monitoring
- ONNX model export and deployment
- Advanced techniques and troubleshooting

---

## 1. Installation and Environment Setup

### 1.1 Unity Installation

**Unity Requirements:**
- Unity 2021.3 LTS or newer (recommended)
- Windows Build Support or Mac Build Support modules
- Unity Hub for easy version management

**Installation Steps:**
1. Download Unity Hub from [unity.com](https://unity.com/download)
2. Install Unity 2021.3 LTS through Unity Hub
3. Include platform-specific build modules during installation

### 1.2 Python Environment Setup

In [None]:
# Check Python version (requires 3.7-3.9)
import sys
print(f"Python version: {sys.version}")

# Verify version compatibility
version_info = sys.version_info
if version_info.major == 3 and 7 <= version_info.minor <= 9:
    print("✅ Python version is compatible with ML-Agents")
else:
    print("⚠️ Warning: Python version may not be fully compatible. Recommended: 3.7-3.9")

### 1.3 Install Required Python Packages

In [None]:
# Install core ML-Agents packages
!pip install mlagents torch matplotlib tensorboard onnx

# Verify installations
import mlagents
import torch
import matplotlib
import onnx

print("✅ All packages installed successfully!")
print(f"ML-Agents version: {mlagents.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"ONNX version: {onnx.version.version}")

### 1.4 Verify ML-Agents CLI

In [None]:
# Test ML-Agents command line interface
!mlagents-learn --help

---

## 2. Unity Project Setup

### 2.1 Install ML-Agents Package in Unity

**Steps to add ML-Agents to your Unity project:**

1. Open your Unity project
2. Go to **Window → Package Manager**
3. Click the **"+"** button and select **"Add package from git URL..."**
4. Enter: `com.unity.ml-agents`
5. Click **"Add"**

**Alternative method (for specific versions):**
- Download from [ML-Agents GitHub](https://github.com/Unity-Technologies/ml-agents)
- Add as local package

### 2.2 Project Structure Setup

In [None]:
# Create directory structure for ML-Agents project
import os

# Define project structure
project_dirs = [
    "Unity_ML_Project/Training",
    "Unity_ML_Project/Training/configs",
    "Unity_ML_Project/Training/results",
    "Unity_ML_Project/Models",
    "Unity_ML_Project/Scripts"
]

# Create directories
for directory in project_dirs:
    os.makedirs(directory, exist_ok=True)
    print(f"📁 Created: {directory}")

print("\n✅ Project structure created successfully!")

---

## 3. Creating Your First Agent

### 3.1 Unity Scene Setup

**Create a new training scene:**

1. **File → New Scene** and save as `TrainingScene`
2. **Add Ground Plane:** GameObject → 3D Object → Plane
3. **Create Training Area:** Empty GameObject named `TrainingArea`
4. **Add Agent:** GameObject → 3D Object → Sphere, name it `Agent`
5. **Add Target:** GameObject → 3D Object → Cube, name it `Target`

**Component Setup for Agent:**
- Add `Rigidbody` component
- Add `Behavior Parameters` component
- Add `Decision Requester` component (optional)
- Add your custom agent script (see below)

### 3.2 Agent Script (C# for Unity)

Create a file called `SimpleAgent.cs` in your Unity project:

In [None]:
# This cell shows the C# script content - copy this to Unity
simple_agent_script = '''
using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using Unity.MLAgents.Sensors;
using UnityEngine;

public class SimpleAgent : Agent
{
    [Header("Agent Settings")]
    public Transform target;
    public float moveSpeed = 5f;
    public float maxDistance = 5f;
    
    private Rigidbody agentRb;
    private Vector3 startingPosition;
    
    public override void Initialize()
    {
        agentRb = GetComponent<Rigidbody>();
        startingPosition = transform.localPosition;
    }
    
    public override void OnEpisodeBegin()
    {
        // Reset agent position and velocity
        transform.localPosition = startingPosition;
        agentRb.velocity = Vector3.zero;
        agentRb.angularVelocity = Vector3.zero;
        
        // Randomize target position
        target.localPosition = new Vector3(
            Random.Range(-4f, 4f),
            0.5f,
            Random.Range(-4f, 4f)
        );
    }
    
    public override void CollectObservations(VectorSensor sensor)
    {
        // Agent position (3 values)
        sensor.AddObservation(transform.localPosition);
        
        // Agent velocity (3 values)
        sensor.AddObservation(agentRb.velocity);
        
        // Target position (3 values)
        sensor.AddObservation(target.localPosition);
        
        // Distance to target (1 value)
        sensor.AddObservation(Vector3.Distance(transform.localPosition, target.localPosition));
        
        // Total observations: 10
    }
    
    public override void OnActionReceived(ActionBuffers actions)
    {
        // Get continuous actions
        float moveX = actions.ContinuousActions[0];
        float moveZ = actions.ContinuousActions[1];
        
        // Apply movement
        Vector3 movement = new Vector3(moveX, 0, moveZ);
        agentRb.AddForce(movement * moveSpeed);
        
        // Calculate distance to target
        float distanceToTarget = Vector3.Distance(transform.localPosition, target.localPosition);
        
        // Reward shaping
        if (distanceToTarget < 1.5f)
        {
            // Reached target
            SetReward(1.0f);
            EndEpisode();
        }
        else if (transform.localPosition.y < 0)
        {
            // Fell off platform
            SetReward(-1.0f);
            EndEpisode();
        }
        else if (distanceToTarget > maxDistance)
        {
            // Too far from target
            SetReward(-1.0f);
            EndEpisode();
        }
        else
        {
            // Small reward for getting closer
            SetReward(-0.01f + (1f / distanceToTarget) * 0.01f);
        }
    }
    
    public override void Heuristic(in ActionBuffers actionsOut)
    {
        // Manual control for testing
        var continuousActions = actionsOut.ContinuousActions;
        continuousActions[0] = Input.GetAxis("Horizontal");
        continuousActions[1] = Input.GetAxis("Vertical");
    }
}
'''

# Save the script to file
with open('Unity_ML_Project/Scripts/SimpleAgent.cs', 'w') as f:
    f.write(simple_agent_script)

print("✅ SimpleAgent.cs script created!")
print("📋 Copy this script to your Unity project's Scripts folder")

### 3.3 Behavior Parameters Configuration

**Configure the Behavior Parameters component:**

1. **Behavior Name:** `SimpleAgent` (must match YAML config)
2. **Vector Observation → Space Size:** `10` (from our CollectObservations)
3. **Actions → Continuous Actions → Space Size:** `2` (X and Z movement)
4. **Behavior Type:** `Default` (for training) or `Heuristic Only` (for manual testing)

**Decision Requester Settings:**
- **Decision Period:** `5` (agent makes decisions every 5 frames)
- **Take Actions Between Decisions:** `true`

---

## 4. Training Configuration

### 4.1 Basic Training Configuration (YAML)

In [None]:
# Create training configuration
basic_config = '''
behaviors:
  SimpleAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 128
      buffer_size: 2048
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 500000
    time_horizon: 64
    summary_freq: 10000
    keep_checkpoints: 5
'''

# Save configuration
with open('Unity_ML_Project/Training/configs/simple_agent_config.yaml', 'w') as f:
    f.write(basic_config)

print("✅ Basic training configuration created!")
print("📁 Saved to: Unity_ML_Project/Training/configs/simple_agent_config.yaml")

### 4.2 Advanced Training Configuration

In [None]:
# Create advanced configuration with more options
advanced_config = '''
behaviors:
  SimpleAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 256
      buffer_size: 10240
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 256
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      curiosity:
        gamma: 0.99
        strength: 0.02
        network_settings:
          normalize: false
          hidden_units: 64
          num_layers: 2
        learning_rate: 3.0e-4
    behavioral_cloning:
      demo_path: ./demonstrations/SimpleAgent.demo
      strength: 0.5
      steps: 150000
    max_steps: 1000000
    time_horizon: 128
    summary_freq: 10000
    keep_checkpoints: 10
    checkpoint_interval: 50000
    threaded: false
'''

# Save advanced configuration
with open('Unity_ML_Project/Training/configs/advanced_agent_config.yaml', 'w') as f:
    f.write(advanced_config)

print("✅ Advanced training configuration created!")
print("📁 Saved to: Unity_ML_Project/Training/configs/advanced_agent_config.yaml")

### 4.3 SAC (Soft Actor-Critic) Configuration

In [None]:
# Create SAC configuration for continuous control tasks
sac_config = '''
behaviors:
  SimpleAgent:
    trainer_type: sac
    hyperparameters:
      learning_rate: 3.0e-4
      learning_rate_schedule: constant
      batch_size: 128
      buffer_size: 50000
      buffer_init_steps: 0
      tau: 0.005
      steps_per_update: 1
      save_replay_buffer: false
      init_entcoef: 1.0
      reward_signal_steps_per_update: 1
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 500000
    time_horizon: 64
    summary_freq: 10000
    keep_checkpoints: 5
'''

# Save SAC configuration
with open('Unity_ML_Project/Training/configs/sac_agent_config.yaml', 'w') as f:
    f.write(sac_config)

print("✅ SAC training configuration created!")
print("📁 Saved to: Unity_ML_Project/Training/configs/sac_agent_config.yaml")

---

## 5. Training Execution

### 5.1 Pre-Training Setup

In [None]:
# Setup training environment
import os
import subprocess
from datetime import datetime

def setup_training_environment():
    """Setup directories and environment for training"""
    
    # Create results directory
    results_dir = "Unity_ML_Project/Training/results"
    os.makedirs(results_dir, exist_ok=True)
    
    # Create logs directory
    logs_dir = "Unity_ML_Project/Training/logs"
    os.makedirs(logs_dir, exist_ok=True)
    
    print(f"✅ Training environment setup complete")
    print(f"📁 Results directory: {results_dir}")
    print(f"📁 Logs directory: {logs_dir}")
    
    return results_dir, logs_dir

results_dir, logs_dir = setup_training_environment()

### 5.2 Training Commands

In [None]:
# Generate training commands
def generate_training_commands():
    """Generate different training command examples"""
    
    commands = {
        "basic_training": {
            "description": "Basic training with Unity Editor",
            "command": "mlagents-learn Unity_ML_Project/Training/configs/simple_agent_config.yaml --run-id=basic_run --train"
        },
        "build_training": {
            "description": "Training with built executable",
            "command": "mlagents-learn Unity_ML_Project/Training/configs/simple_agent_config.yaml --run-id=build_run --env=path/to/your/build.exe --train"
        },
        "resume_training": {
            "description": "Resume previous training",
            "command": "mlagents-learn Unity_ML_Project/Training/configs/simple_agent_config.yaml --run-id=basic_run --resume --train"
        },
        "advanced_training": {
            "description": "Advanced training with custom parameters",
            "command": "mlagents-learn Unity_ML_Project/Training/configs/advanced_agent_config.yaml --run-id=advanced_run --train --num-envs=4 --width=1920 --height=1080"
        },
        "sac_training": {
            "description": "SAC algorithm training",
            "command": "mlagents-learn Unity_ML_Project/Training/configs/sac_agent_config.yaml --run-id=sac_run --train"
        }
    }
    
    print("🚀 Training Commands:")
    print("=" * 50)
    
    for name, info in commands.items():
        print(f"\n📋 {info['description']}:")
        print(f"   {info['command']}")
    
    return commands

training_commands = generate_training_commands()

### 5.3 Training Monitoring

In [None]:
# TensorBoard monitoring setup
def setup_tensorboard_monitoring():
    """Setup TensorBoard for training monitoring"""
    
    tensorboard_command = "tensorboard --logdir=Unity_ML_Project/Training/results --port=6006"
    
    print("📊 TensorBoard Monitoring Setup:")
    print("=" * 40)
    print(f"Command: {tensorboard_command}")
    print("\n📈 Key Metrics to Monitor:")
    print("- Environment/Cumulative Reward")
    print("- Environment/Episode Length")
    print("- Policy/Learning Rate")
    print("- Policy/Entropy")
    print("- Policy/Value Loss")
    print("- Policy/Policy Loss")
    
    print("\n🌐 Access TensorBoard at: http://localhost:6006")
    
    return tensorboard_command

tensorboard_cmd = setup_tensorboard_monitoring()

### 5.4 Training Progress Tracking

In [None]:
# Training progress analysis
import json
import matplotlib.pyplot as plt
import numpy as np

def analyze_training_progress(run_id="basic_run"):
    """Analyze training progress from results"""
    
    # This function would typically read from actual training results
    # For demonstration, we'll create sample data
    
    # Sample training data
    steps = np.arange(0, 100000, 1000)
    rewards = np.random.normal(0.5, 0.3, len(steps)).cumsum() * 0.01
    episode_lengths = np.random.normal(100, 20, len(steps))
    
    # Create plots
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
    # Reward plot
    ax1.plot(steps, rewards, 'b-', linewidth=2)
    ax1.set_title('Training Progress: Cumulative Reward')
    ax1.set_xlabel('Training Steps')
    ax1.set_ylabel('Cumulative Reward')
    ax1.grid(True, alpha=0.3)
    
    # Episode length plot
    ax2.plot(steps, episode_lengths, 'r-', linewidth=2)
    ax2.set_title('Training Progress: Episode Length')
    ax2.set_xlabel('Training Steps')
    ax2.set_ylabel('Episode Length')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('Unity_ML_Project/Training/training_progress.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Training statistics
    stats = {
        "final_reward": float(rewards[-1]),
        "max_reward": float(np.max(rewards)),
        "avg_episode_length": float(np.mean(episode_lengths)),
        "total_steps": int(steps[-1])
    }
    
    print("📊 Training Statistics:")
    print(f"   Final Reward: {stats['final_reward']:.3f}")
    print(f"   Max Reward: {stats['max_reward']:.3f}")
    print(f"   Avg Episode Length: {stats['avg_episode_length']:.1f}")
    print(f"   Total Steps: {stats['total_steps']:,}")
    
    return stats

# Run analysis
training_stats = analyze_training_progress()

---

## 6. ONNX Model Export and Management

### 6.1 ONNX Export Process

In [None]:
# ONNX model export utilities
import onnx
import onnxruntime as ort
import shutil

def export_model_to_onnx(run_id="basic_run", behavior_name="SimpleAgent"):
    """Export trained model to ONNX format"""
    
    # Paths
    results_path = f"Unity_ML_Project/Training/results/{run_id}"
    onnx_path = f"Unity_ML_Project/Models/{behavior_name}_{run_id}.onnx"
    
    print(f"🔄 Exporting model to ONNX format...")
    print(f"   Source: {results_path}")
    print(f"   Target: {onnx_path}")
    
    # ML-Agents automatically generates ONNX files during training
    # The ONNX file will be in the results directory
    source_onnx = f"{results_path}/{behavior_name}.onnx"
    
    try:
        # Copy ONNX file to Models directory
        shutil.copy2(source_onnx, onnx_path)
        print(f"✅ Model exported successfully!")
        
        # Verify ONNX model
        model = onnx.load(onnx_path)
        onnx.checker.check_model(model)
        print(f"✅ ONNX model validation passed!")
        
        return onnx_path
    
    except FileNotFoundError:
        print(f"⚠️ ONNX file not found at {source_onnx}")
        print(f"   Make sure training completed successfully")
        return None

def analyze_onnx_model(onnx_path):
    """Analyze ONNX model structure"""
    
    try:
        model = onnx.load(onnx_path)
        
        print(f"🔍 ONNX Model Analysis:")
        print(f"   Model version: {model.model_version}")
        print(f"   Producer: {model.producer_name} {model.producer_version}")
        print(f"   Graph name: {model.graph.name}")
        
        # Input information
        print(f"\n📥 Model Inputs:")
        for inp in model.graph.input:
            shape = [dim.dim_value for dim in inp.type.tensor_type.shape.dim]
            print(f"   - {inp.name}: {shape}")
        
        # Output information
        print(f"\n📤 Model Outputs:")
        for out in model.graph.output:
            shape = [dim.dim_value for dim in out.type.tensor_type.shape.dim]
            print(f"   - {out.name}: {shape}")
        
        # Node count
        print(f"\n🧠 Model Structure:")
        print(f"   Total nodes: {len(model.graph.node)}")
        
        # Node types
        node_types = {}
        for node in model.graph.node:
            node_types[node.op_type] = node_types.get(node.op_type, 0) + 1
        
        print(f"   Node types:")
        for op_type, count in sorted(node_types.items()):
            print(f"     - {op_type}: {count}")
    
    except Exception as e:
        print(f"❌ Error analyzing ONNX model: {e}")

# Example usage (uncomment when you have trained models)
# onnx_model_path = export_model_to_onnx("basic_run", "SimpleAgent")
# if onnx_model_path:
#     analyze_onnx_model(onnx_model_path)

print("✅ ONNX export utilities ready!")

### 6.2 ONNX Runtime Testing

In [None]:
# Test ONNX model with ONNX Runtime
def test_onnx_model(onnx_path, test_input=None):
    """Test ONNX model inference"""
    
    try:
        # Load ONNX Runtime session
        session = ort.InferenceSession(onnx_path)
        
        print(f"🧪 Testing ONNX model: {onnx_path}")
        
        # Get model input/output info
        input_info = session.get_inputs()[0]
        output_info = session.get_outputs()[0]
        
        print(f"   Input: {input_info.name} {input_info.shape}")
        print(f"   Output: {output_info.name} {output_info.shape}")
        
        # Create test input if not provided
        if test_input is None:
            input_shape = input_info.shape
            if -1 in input_shape:  # Dynamic batch size
                input_shape = [1 if dim == -1 else dim for dim in input_shape]
            
            test_input = np.random.randn(*input_shape).astype(np.float32)
            print(f"   Generated test input shape: {test_input.shape}")
        
        # Run inference
        outputs = session.run([output_info.name], {input_info.name: test_input})
        
        print(f"✅ Inference successful!")
        print(f"   Output shape: {outputs[0].shape}")
        print(f"   Output sample: {outputs[0].flatten()[:5]}")
        
        return outputs[0]
    
    except Exception as e:
        print(f"❌ Error testing ONNX model: {e}")
        return None

# Example usage
print("✅ ONNX testing utilities ready!")
print("   Use test_onnx_model('path/to/model.onnx') to test your models")

### 6.3 Unity ONNX Integration

In [None]:
# Unity integration instructions for ONNX models
unity_integration_guide = '''
# Unity ONNX Model Integration Guide

## Step 1: Import ONNX Model to Unity
1. Drag your .onnx file into Unity's Assets folder
2. Unity will automatically import it as a Model asset
3. Recommended location: Assets/ML-Agents/Models/

## Step 2: Configure Behavior Parameters
1. Select your Agent GameObject
2. In Behavior Parameters component:
   - Set "Behavior Type" to "Inference Only"
   - Drag your .onnx file to the "Model" field
   - Ensure "Use Child Sensors" is checked if using sensors

## Step 3: Disable Training Components
1. Remove or disable "Decision Requester" component
2. Your agent will now use the trained model for decisions

## Step 4: Test Model Performance
1. Enter Play mode
2. Observe agent behavior
3. Monitor performance in Profiler if needed

## Troubleshooting:
- Ensure observation space matches training configuration
- Check action space dimensions
- Verify normalization settings match training
- Test with Heuristic mode first to validate environment
'''

# Save integration guide
with open('Unity_ML_Project/Unity_ONNX_Integration_Guide.md', 'w') as f:
    f.write(unity_integration_guide)

print("✅ Unity ONNX integration guide created!")
print("📁 Saved to: Unity_ML_Project/Unity_ONNX_Integration_Guide.md")
print("\n📋 Integration Steps:")
print(unity_integration_guide)

---

## 7. Advanced Training Techniques

### 7.1 Curriculum Learning

In [None]:
# Curriculum learning configuration
curriculum_config = '''
behaviors:
  SimpleAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 128
      buffer_size: 2048
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 500000
    time_horizon: 64
    summary_freq: 10000
    keep_checkpoints: 5

# Curriculum Learning Configuration
curriculum:
  SimpleAgent:
    measure: progress
    thresholds: [0.1, 0.3, 0.5]
    min_lesson_length: 100
    signal_smoothing: true
    parameters:
      target_distance:
        curriculum:
          - name: lesson1
            completion_criteria:
              measure: progress
              behavior: SimpleAgent
              signal_smoothing: true
              min_lesson_length: 100
              threshold: 0.2
            value: 2.0
          - name: lesson2
            completion_criteria:
              measure: progress
              behavior: SimpleAgent
              signal_smoothing: true
              min_lesson_length: 100
              threshold: 0.5
            value: 4.0
          - name: lesson3
            value: 6.0
'''

# Save curriculum configuration
with open('Unity_ML_Project/Training/configs/curriculum_config.yaml', 'w') as f:
    f.write(curriculum_config)

print("✅ Curriculum learning configuration created!")
print("📁 Saved to: Unity_ML_Project/Training/configs/curriculum_config.yaml")

### 7.2 Multi-Agent Training

In [None]:
# Multi-agent training configuration
multiagent_config = '''
behaviors:
  SimpleAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 512
      buffer_size: 10240
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 1000000
    time_horizon: 128
    summary_freq: 10000
    keep_checkpoints: 5
    
  CompetitorAgent:
    trainer_type: ppo
    hyperparameters:
      batch_size: 512
      buffer_size: 10240
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 1000000
    time_horizon: 128
    summary_freq: 10000
    keep_checkpoints: 5

# Self-play configuration
env_settings:
  env_path: ./builds/MultiAgent
  env_args: null
  base_port: 5005
  num_envs: 1
  num_areas: 1
  seed: -1
  max_lifetime_restarts: 10
  restarts_rate_limit_n: 1
  restarts_rate_limit_period_s: 60
'''

# Save multi-agent configuration
with open('Unity_ML_Project/Training/configs/multiagent_config.yaml', 'w') as f:
    f.write(multiagent_config)

print("✅ Multi-agent training configuration created!")
print("📁 Saved to: Unity_ML_Project/Training/configs/multiagent_config.yaml")

### 7.3 Hyperparameter Optimization

In [None]:
# Hyperparameter optimization utilities
import itertools
import yaml

def generate_hyperparameter_sweep():
    """Generate multiple configurations for hyperparameter optimization"""
    
    # Define parameter ranges
    param_ranges = {
        'learning_rate': [1e-4, 3e-4, 1e-3],
        'batch_size': [64, 128, 256],
        'hidden_units': [128, 256, 512],
        'num_layers': [2, 3, 4],
        'buffer_size': [2048, 10240, 20480]
    }
    
    # Base configuration
    base_config = {
        'behaviors': {
            'SimpleAgent': {
                'trainer_type': 'ppo',
                'hyperparameters': {
                    'beta': 5.0e-4,
                    'epsilon': 0.2,
                    'lambd': 0.99,
                    'num_epoch': 3
                },
                'network_settings': {
                    'normalize': False
                },
                'reward_signals': {
                    'extrinsic': {
                        'gamma': 0.99,
                        'strength': 1.0
                    }
                },
                'max_steps': 200000,
                'time_horizon': 64,
                'summary_freq': 10000,
                'keep_checkpoints': 3
            }
        }
    }
    
    # Generate limited combinations (avoid combinatorial explosion)
    configs = []
    config_id = 0
    
    # Sample a few combinations instead of all
    import random
    random.seed(42)
    
    for lr in param_ranges['learning_rate']:
        for batch_size in param_ranges['batch_size']:
            for hidden_units in param_ranges['hidden_units'][:2]:  # Limit to reduce combinations
                config = base_config.copy()
                config['behaviors']['SimpleAgent']['hyperparameters']['learning_rate'] = lr
                config['behaviors']['SimpleAgent']['hyperparameters']['batch_size'] = batch_size
                config['behaviors']['SimpleAgent']['network_settings']['hidden_units'] = hidden_units
                
                # Save configuration
                config_filename = f'Unity_ML_Project/Training/configs/sweep_config_{config_id:03d}.yaml'
                with open(config_filename, 'w') as f:
                    yaml.dump(config, f, default_flow_style=False)
                
                configs.append({
                    'id': config_id,
                    'filename': config_filename,
                    'params': {
                        'learning_rate': lr,
                        'batch_size': batch_size,
                        'hidden_units': hidden_units
                    }
                })
                
                config_id += 1
    
    print(f"✅ Generated {len(configs)} hyperparameter configurations")
    
    # Generate training script
    training_script = "#!/bin/bash\n\n"
    training_script += "# Hyperparameter sweep training script\n\n"
    
    for config in configs:
        run_id = f"sweep_run_{config['id']:03d}"
        training_script += f"echo 'Starting training {config['id']}: {config['params']}'\n"
        training_script += f"mlagents-learn {config['filename']} --run-id={run_id} --train\n\n"
    
    with open('Unity_ML_Project/Training/run_hyperparameter_sweep.sh', 'w') as f:
        f.write(training_script)
    
    print("✅ Hyperparameter sweep script created!")
    print("📁 Saved to: Unity_ML_Project/Training/run_hyperparameter_sweep.sh")
    
    return configs

# Generate sweep configurations
sweep_configs = generate_hyperparameter_sweep()

---

## 8. Model Comparison and Analysis

### 8.1 Training Results Comparison

In [None]:
# Model comparison utilities
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def compare_training_results(run_ids):
    """Compare results from multiple training runs"""
    
    # Sample data for demonstration
    comparison_data = []
    
    for i, run_id in enumerate(run_ids):
        # In real scenario, you'd load actual training data
        # Here we generate sample data
        final_reward = np.random.normal(0.8, 0.2)
        max_reward = final_reward + np.random.uniform(0.1, 0.3)
        training_time = np.random.uniform(30, 120)  # minutes
        convergence_step = np.random.randint(50000, 200000)
        
        comparison_data.append({
            'run_id': run_id,
            'final_reward': final_reward,
            'max_reward': max_reward,
            'training_time_min': training_time,
            'convergence_step': convergence_step,
            'efficiency': final_reward / training_time
        })
    
    df = pd.DataFrame(comparison_data)
    
    # Create comparison plots
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Final reward comparison
    axes[0, 0].bar(df['run_id'], df['final_reward'], color='skyblue')
    axes[0, 0].set_title('Final Reward Comparison')
    axes[0, 0].set_ylabel('Final Reward')
    axes[0, 0].tick_params(axis='x', rotation=45)
    
    # Training time comparison
    axes[0, 1].bar(df['run_id'], df['training_time_min'], color='lightcoral')
    axes[0, 1].set_title('Training Time Comparison')
    axes[0, 1].set_ylabel('Training Time (minutes)')
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Convergence comparison
    axes[1, 0].bar(df['run_id'], df['convergence_step'], color='lightgreen')
    axes[1, 0].set_title('Convergence Speed')
    axes[1, 0].set_ylabel('Steps to Convergence')
    axes[1, 0].tick_params(axis='x', rotation=45)
    
    # Efficiency scatter
    axes[1, 1].scatter(df['training_time_min'], df['final_reward'], s=100, alpha=0.7)
    axes[1, 1].set_title('Training Efficiency')
    axes[1, 1].set_xlabel('Training Time (minutes)')
    axes[1, 1].set_ylabel('Final Reward')
    
    # Add run_id labels to scatter plot
    for i, run_id in enumerate(df['run_id']):
        axes[1, 1].annotate(run_id, (df['training_time_min'].iloc[i], df['final_reward'].iloc[i]),
                           xytext=(5, 5), textcoords='offset points', fontsize=8)
    
    plt.tight_layout()
    plt.savefig('Unity_ML_Project/Training/model_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Print summary table
    print("📊 Training Results Comparison:")
    print("=" * 80)
    print(df.round(3).to_string(index=False))
    
    # Find best model
    best_reward_idx = df['final_reward'].idxmax()
    best_efficiency_idx = df['efficiency'].idxmax()
    
    print(f"\n🏆 Best Performance:")
    print(f"   Highest Reward: {df.loc[best_reward_idx, 'run_id']} ({df.loc[best_reward_idx, 'final_reward']:.3f})")
    print(f"   Best Efficiency: {df.loc[best_efficiency_idx, 'run_id']} ({df.loc[best_efficiency_idx, 'efficiency']:.4f})")
    
    return df

# Example comparison
sample_runs = ['basic_run', 'advanced_run', 'sac_run', 'curriculum_run']
comparison_results = compare_training_results(sample_runs)

### 8.2 Model Performance Benchmarking

In [None]:
# Benchmarking utilities
import time

def benchmark_onnx_inference(onnx_path, num_inferences=1000):
    """Benchmark ONNX model inference speed"""
    
    try:
        # Load model
        session = ort.InferenceSession(onnx_path)
        input_info = session.get_inputs()[0]
        
        # Prepare test input
        input_shape = input_info.shape
        if -1 in input_shape:
            input_shape = [1 if dim == -1 else dim for dim in input_shape]
        
        test_input = np.random.randn(*input_shape).astype(np.float32)
        
        print(f"🚀 Benchmarking ONNX model: {onnx_path}")
        print(f"   Input shape: {input_shape}")
        print(f"   Number of inferences: {num_inferences}")
        
        # Warm-up runs
        for _ in range(10):
            session.run(None, {input_info.name: test_input})
        
        # Benchmark
        start_time = time.time()
        
        for _ in range(num_inferences):
            outputs = session.run(None, {input_info.name: test_input})
        
        end_time = time.time()
        
        # Calculate metrics
        total_time = end_time - start_time
        avg_inference_time = (total_time / num_inferences) * 1000  # ms
        inferences_per_second = num_inferences / total_time
        
        print(f"\n📊 Benchmark Results:")
        print(f"   Total time: {total_time:.3f} seconds")
        print(f"   Average inference time: {avg_inference_time:.3f} ms")
        print(f"   Inferences per second: {inferences_per_second:.1f}")
        
        # Performance rating
        if avg_inference_time < 1.0:
            rating = "🟢 Excellent (< 1ms)"
        elif avg_inference_time < 5.0:
            rating = "🟡 Good (1-5ms)"
        elif avg_inference_time < 10.0:
            rating = "🟠 Acceptable (5-10ms)"
        else:
            rating = "🔴 Slow (> 10ms)"
        
        print(f"   Performance rating: {rating}")
        
        return {
            'total_time': total_time,
            'avg_inference_time_ms': avg_inference_time,
            'inferences_per_second': inferences_per_second
        }
    
    except Exception as e:
        print(f"❌ Benchmarking failed: {e}")
        return None

# Memory usage analysis
def analyze_model_size(onnx_path):
    """Analyze ONNX model size and complexity"""
    
    try:
        import os
        
        # File size
        file_size_bytes = os.path.getsize(onnx_path)
        file_size_mb = file_size_bytes / (1024 * 1024)
        
        # Load model for analysis
        model = onnx.load(onnx_path)
        
        # Count parameters
        total_params = 0
        for initializer in model.graph.initializer:
            param_size = 1
            for dim in initializer.dims:
                param_size *= dim
            total_params += param_size
        
        print(f"📏 Model Size Analysis:")
        print(f"   File size: {file_size_mb:.2f} MB ({file_size_bytes:,} bytes)")
        print(f"   Total parameters: {total_params:,}")
        print(f"   Model nodes: {len(model.graph.node)}")
        
        # Size category
        if file_size_mb < 1:
            size_category = "🟢 Small (< 1MB)"
        elif file_size_mb < 10:
            size_category = "🟡 Medium (1-10MB)"
        elif file_size_mb < 100:
            size_category = "🟠 Large (10-100MB)"
        else:
            size_category = "🔴 Very Large (> 100MB)"
        
        print(f"   Size category: {size_category}")
        
        return {
            'file_size_mb': file_size_mb,
            'total_parameters': total_params,
            'node_count': len(model.graph.node)
        }
    
    except Exception as e:
        print(f"❌ Size analysis failed: {e}")
        return None

print("✅ Benchmarking utilities ready!")
print("   Use benchmark_onnx_inference('path/to/model.onnx') to test performance")
print("   Use analyze_model_size('path/to/model.onnx') to analyze model size")

---

## 9. Troubleshooting and Best Practices

### 9.1 Common Issues and Solutions

In [None]:
# Troubleshooting guide
troubleshooting_guide = '''
# Unity ML-Agents Troubleshooting Guide

## 1. Training Issues

### Problem: Training doesn't start
**Symptoms:**
- Command hangs after "Start training by pressing the Play button in the Unity Editor"
- No connection between Unity and Python

**Solutions:**
- Ensure Unity is in Play mode
- Check that Behavior Name in Unity matches YAML config
- Verify no firewall blocking communication
- Try different port with --base-port parameter

### Problem: Slow or no learning
**Symptoms:**
- Reward stays flat or decreases
- Agent behavior doesn't improve

**Solutions:**
- Review reward function design
- Check observation space completeness
- Adjust learning rate (try 1e-4 to 1e-3)
- Increase training steps
- Use reward shaping techniques

### Problem: Training crashes
**Symptoms:**
- Python process terminates unexpectedly
- Out of memory errors

**Solutions:**
- Reduce batch_size and buffer_size
- Decrease number of parallel environments
- Check for infinite loops in agent code
- Monitor system resources

## 2. ONNX Export Issues

### Problem: ONNX file not generated
**Solutions:**
- Ensure training completed successfully
- Check results directory permissions
- Verify ONNX package installation

### Problem: ONNX model doesn't work in Unity
**Solutions:**
- Verify observation space matches exactly
- Check action space configuration
- Ensure normalization settings match
- Test with Heuristic mode first

## 3. Performance Issues

### Problem: Slow inference in Unity
**Solutions:**
- Reduce network size (hidden_units, num_layers)
- Optimize observation collection
- Use Decision Requester with appropriate frequency
- Profile with Unity Profiler

## 4. Configuration Issues

### Problem: YAML parsing errors
**Solutions:**
- Check YAML syntax (indentation, colons)
- Validate with online YAML validator
- Use consistent spacing (2 or 4 spaces)

### Problem: Hyperparameter tuning difficulties
**Solutions:**
- Start with proven configurations
- Change one parameter at a time
- Use curriculum learning for complex tasks
- Monitor TensorBoard metrics
'''

# Save troubleshooting guide
with open('Unity_ML_Project/Troubleshooting_Guide.md', 'w') as f:
    f.write(troubleshooting_guide)

print("✅ Troubleshooting guide created!")
print("📁 Saved to: Unity_ML_Project/Troubleshooting_Guide.md")

### 9.2 Best Practices Checklist

In [None]:
# Best practices checklist
def generate_best_practices_checklist():
    """Generate comprehensive best practices checklist"""
    
    checklist = {
        "Environment Design": [
            "✓ Clear success/failure conditions defined",
            "✓ Appropriate episode length (not too short/long)",
            "✓ Randomized starting conditions",
            "✓ Proper physics settings and constraints",
            "✓ Visual debugging aids available"
        ],
        "Agent Implementation": [
            "✓ Comprehensive observation space",
            "✓ Normalized observations when needed",
            "✓ Well-designed reward function",
            "✓ Proper action space dimensionality",
            "✓ Heuristic function for testing"
        ],
        "Training Configuration": [
            "✓ Behavior name matches between Unity and YAML",
            "✓ Appropriate hyperparameters for task complexity",
            "✓ Sufficient training steps allocated",
            "✓ Regular checkpoint saving enabled",
            "✓ TensorBoard logging configured"
        ],
        "Reward Engineering": [
            "✓ Positive rewards for desired behaviors",
            "✓ Negative rewards for undesired behaviors",
            "✓ Sparse rewards supplemented with dense rewards",
            "✓ Reward magnitude scaling appropriate",
            "✓ Terminal rewards clearly defined"
        ],
        "Training Process": [
            "✓ Start with simple scenarios",
            "✓ Monitor training progress regularly",
            "✓ Save models at key milestones",
            "✓ Test intermediate models",
            "✓ Document hyperparameter experiments"
        ],
        "Model Deployment": [
            "✓ ONNX model exported successfully",
            "✓ Model performance benchmarked",
            "✓ Inference speed acceptable for real-time use",
            "✓ Model behavior validated in target environment",
            "✓ Fallback mechanisms implemented"
        ],
        "Optimization": [
            "✓ Network architecture optimized for task",
            "✓ Training time vs performance balanced",
            "✓ Multiple training runs compared",
            "✓ Hyperparameter sensitivity analyzed",
            "✓ Curriculum learning considered for complex tasks"
        ]
    }
    
    print("📋 Unity ML-Agents Best Practices Checklist")
    print("=" * 60)
    
    checklist_text = "# Unity ML-Agents Best Practices Checklist\n\n"
    
    for category, items in checklist.items():
        print(f"\n🎯 {category}:")
        checklist_text += f"## {category}\n\n"
        
        for item in items:
            print(f"   {item}")
            checklist_text += f"- [ ] {item[2:]}\n"  # Remove the ✓ for markdown
        
        checklist_text += "\n"
    
    # Save checklist
    with open('Unity_ML_Project/Best_Practices_Checklist.md', 'w') as f:
        f.write(checklist_text)
    
    print(f"\n✅ Best practices checklist saved!")
    print(f"📁 Saved to: Unity_ML_Project/Best_Practices_Checklist.md")
    
    return checklist

# Generate checklist
best_practices = generate_best_practices_checklist()

### 9.3 Validation and Testing Framework

In [None]:
# Validation and testing framework
def create_validation_framework():
    """Create comprehensive validation framework for ML-Agents projects"""
    
    validation_script = '''
#!/usr/bin/env python3
"""Unity ML-Agents Project Validation Script"""

import os
import yaml
import onnx
import onnxruntime as ort
import numpy as np

class MLAgentsValidator:
    def __init__(self, project_path):
        self.project_path = project_path
        self.results = []
    
    def validate_project_structure(self):
        """Validate project directory structure"""
        required_dirs = [
            'Training/configs',
            'Training/results',
            'Models',
            'Scripts'
        ]
        
        print("🔍 Validating project structure...")
        
        for dir_path in required_dirs:
            full_path = os.path.join(self.project_path, dir_path)
            if os.path.exists(full_path):
                self.results.append(f"✅ Directory exists: {dir_path}")
            else:
                self.results.append(f"❌ Missing directory: {dir_path}")
    
    def validate_config_files(self):
        """Validate YAML configuration files"""
        config_dir = os.path.join(self.project_path, 'Training/configs')
        
        print("🔍 Validating configuration files...")
        
        if not os.path.exists(config_dir):
            self.results.append("❌ Config directory not found")
            return
        
        yaml_files = [f for f in os.listdir(config_dir) if f.endswith('.yaml')]
        
        for yaml_file in yaml_files:
            try:
                with open(os.path.join(config_dir, yaml_file), 'r') as f:
                    config = yaml.safe_load(f)
                
                # Validate structure
                if 'behaviors' in config:
                    self.results.append(f"✅ Valid config: {yaml_file}")
                else:
                    self.results.append(f"❌ Invalid config structure: {yaml_file}")
            
            except Exception as e:
                self.results.append(f"❌ Config parse error in {yaml_file}: {str(e)}")
    
    def validate_onnx_models(self):
        """Validate ONNX model files"""
        models_dir = os.path.join(self.project_path, 'Models')
        
        print("🔍 Validating ONNX models...")
        
        if not os.path.exists(models_dir):
            self.results.append("❌ Models directory not found")
            return
        
        onnx_files = [f for f in os.listdir(models_dir) if f.endswith('.onnx')]
        
        if not onnx_files:
            self.results.append("⚠️ No ONNX models found")
            return
        
        for onnx_file in onnx_files:
            try:
                model_path = os.path.join(models_dir, onnx_file)
                
                # Load and validate
                model = onnx.load(model_path)
                onnx.checker.check_model(model)
                
                # Test inference
                session = ort.InferenceSession(model_path)
                
                self.results.append(f"✅ Valid ONNX model: {onnx_file}")
            
            except Exception as e:
                self.results.append(f"❌ ONNX validation failed for {onnx_file}: {str(e)}")
    
    def run_validation(self):
        """Run complete validation suite"""
        print("🚀 Starting ML-Agents project validation...")
        print("=" * 60)
        
        self.validate_project_structure()
        self.validate_config_files()
        self.validate_onnx_models()
        
        print("\n📊 Validation Results:")
        print("=" * 40)
        
        for result in self.results:
            print(result)
        
        # Summary
        passed = len([r for r in self.results if r.startswith('✅')])
        failed = len([r for r in self.results if r.startswith('❌')])
        warnings = len([r for r in self.results if r.startswith('⚠️')])
        
        print(f"\n📈 Summary: {passed} passed, {failed} failed, {warnings} warnings")
        
        return failed == 0

# Usage example
if __name__ == "__main__":
    validator = MLAgentsValidator("Unity_ML_Project")
    success = validator.run_validation()
    
    if success:
        print("\n🎉 Project validation passed!")
    else:
        print("\n⚠️ Project validation failed. Please fix the issues above.")
    '''
    
    # Save validation script
    with open('Unity_ML_Project/validate_project.py', 'w') as f:
        f.write(validation_script)
    
    # Make it executable
    os.chmod('Unity_ML_Project/validate_project.py', 0o755)
    
    print("✅ Validation framework created!")
    print("📁 Saved to: Unity_ML_Project/validate_project.py")
    print("\n🚀 Usage: python Unity_ML_Project/validate_project.py")

# Create validation framework
create_validation_framework()

---

## 10. Complete Workflow Summary

### 10.1 End-to-End Training Pipeline

In [None]:
# Complete workflow pipeline
def create_complete_workflow():
    """Create a complete end-to-end workflow script"""
    
    workflow_script = '''
#!/bin/bash
# Unity ML-Agents Complete Training Workflow
# This script demonstrates the complete training pipeline

echo "🚀 Unity ML-Agents Complete Training Workflow"
echo "============================================="

# Step 1: Validate project setup
echo "\n📋 Step 1: Validating project setup..."
python validate_project.py

if [ $? -ne 0 ]; then
    echo "❌ Project validation failed. Please fix issues before proceeding."
    exit 1
fi

# Step 2: Start TensorBoard monitoring
echo "\n📊 Step 2: Starting TensorBoard monitoring..."
tensorboard --logdir=Training/results --port=6006 &
TENSORBOARD_PID=$!
echo "TensorBoard available at: http://localhost:6006"

# Step 3: Run basic training
echo "\n🎯 Step 3: Starting basic training..."
mlagents-learn Training/configs/simple_agent_config.yaml --run-id=workflow_basic --train

# Step 4: Export ONNX model
echo "\n📦 Step 4: Model should be exported as ONNX automatically"
echo "Check Training/results/workflow_basic/ for SimpleAgent.onnx"

# Step 5: Validate ONNX model
echo "\n🔍 Step 5: Validating ONNX model..."
python -c "
import onnx
import onnxruntime as ort
try:
    model = onnx.load('Training/results/workflow_basic/SimpleAgent.onnx')
    onnx.checker.check_model(model)
    session = ort.InferenceSession('Training/results/workflow_basic/SimpleAgent.onnx')
    print('✅ ONNX model validation passed!')
except Exception as e:
    print(f'❌ ONNX validation failed: {e}')
"

# Step 6: Copy model to deployment directory
echo "\n📁 Step 6: Copying model to deployment directory..."
cp Training/results/workflow_basic/SimpleAgent.onnx Models/SimpleAgent_workflow.onnx
echo "✅ Model copied to Models/SimpleAgent_workflow.onnx"

# Step 7: Benchmark model
echo "\n⚡ Step 7: Benchmarking model performance..."
python -c "
import onnxruntime as ort
import numpy as np
import time

session = ort.InferenceSession('Models/SimpleAgent_workflow.onnx')
input_info = session.get_inputs()[0]
test_input = np.random.randn(1, 10).astype(np.float32)

# Benchmark
start_time = time.time()
for _ in range(1000):
    outputs = session.run(None, {input_info.name: test_input})
end_time = time.time()

avg_time_ms = ((end_time - start_time) / 1000) * 1000
print(f'⚡ Average inference time: {avg_time_ms:.3f}ms')
"

# Step 8: Generate training report
echo "\n📊 Step 8: Generating training report..."
python -c "
import os
from datetime import datetime

report = f'''# Training Report - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

## Training Configuration
- Config: simple_agent_config.yaml
- Run ID: workflow_basic
- Algorithm: PPO

## Results
- ONNX Model: ✅ Generated successfully
- Model Location: Models/SimpleAgent_workflow.onnx
- Validation: ✅ Passed

## Next Steps
1. Import ONNX model to Unity
2. Configure Behavior Parameters
3. Test agent performance
4. Deploy to production environment

## Files Generated
- Training results: Training/results/workflow_basic/
- ONNX model: Models/SimpleAgent_workflow.onnx
- Logs: Available in TensorBoard
'''

with open('Training_Report.md', 'w') as f:
    f.write(report)

print('📊 Training report generated: Training_Report.md')
"

# Cleanup
echo "\n🧹 Cleaning up..."
kill $TENSORBOARD_PID 2>/dev/null

echo "\n🎉 Workflow completed successfully!"
echo "📁 Check the following files:"
echo "   - Models/SimpleAgent_workflow.onnx (ONNX model)"
echo "   - Training_Report.md (Training report)"
echo "   - Training/results/workflow_basic/ (Training results)"
    '''
    
    # Save workflow script  
    with open('Unity_ML_Project/complete_workflow.sh', 'w') as f:
        f.write(workflow_script)
    
    # Make executable
    os.chmod('Unity_ML_Project/complete_workflow.sh', 0o755)
    
    print("✅ Complete workflow script created!")
    print("📁 Saved to: Unity_ML_Project/complete_workflow.sh")
    print("\n🚀 Usage: cd Unity_ML_Project && ./complete_workflow.sh")

# Create complete workflow
create_complete_workflow()

### 10.2 Project Summary and Final Checklist

In [None]:
# Generate final project summary
def generate_project_summary():
    """Generate comprehensive project summary"""
    
    # List all created files
    created_files = []
    for root, dirs, files in os.walk('Unity_ML_Project'):
        for file in files:
            created_files.append(os.path.join(root, file))
    
    summary = f'''
# Unity ML-Agents Complete Training Guide - Project Summary

## 📁 Generated Project Structure

```
Unity_ML_Project/
├── Scripts/
│   └── SimpleAgent.cs                    # C# Agent script for Unity
├── Training/
│   ├── configs/
│   │   ├── simple_agent_config.yaml      # Basic PPO configuration
│   │   ├── advanced_agent_config.yaml    # Advanced PPO with curiosity
│   │   ├── sac_agent_config.yaml         # SAC algorithm configuration
│   │   ├── curriculum_config.yaml        # Curriculum learning setup
│   │   ├── multiagent_config.yaml        # Multi-agent training
│   │   └── sweep_config_*.yaml           # Hyperparameter sweep configs
│   ├── results/                          # Training results directory
│   ├── logs/                            # Training logs directory
│   └── run_hyperparameter_sweep.sh      # Automated hyperparameter tuning
├── Models/                              # ONNX models directory
├── complete_workflow.sh                 # Complete training pipeline
├── validate_project.py                  # Project validation script
├── Unity_ONNX_Integration_Guide.md      # Unity integration instructions
├── Troubleshooting_Guide.md            # Common issues and solutions
└── Best_Practices_Checklist.md         # Comprehensive best practices
```

## 🚀 Quick Start Guide

### 1. Unity Setup
1. Install Unity 2021.3 LTS or newer
2. Create new Unity project
3. Install ML-Agents package: `com.unity.ml-agents`
4. Copy `Scripts/SimpleAgent.cs` to your Unity project
5. Create training scene with Agent, Target, and Ground

### 2. Python Environment
```bash
# Install required packages
pip install mlagents torch matplotlib tensorboard onnx

# Verify installation
mlagents-learn --help
```

### 3. Training
```bash
# Basic training
mlagents-learn Training/configs/simple_agent_config.yaml --run-id=my_first_run --train

# Monitor with TensorBoard
tensorboard --logdir=Training/results --port=6006
```

### 4. Model Deployment
1. ONNX model automatically generated in `Training/results/my_first_run/`
2. Import `.onnx` file to Unity
3. Configure Behavior Parameters component
4. Set Behavior Type to "Inference Only"

## 🎯 Key Features Covered

### ✅ Environment Setup
- Unity and Python installation verification
- ML-Agents package installation
- Project structure creation

### ✅ Agent Development
- Complete C# agent script with observations, actions, and rewards
- Proper episode management and reset logic
- Heuristic mode for manual testing

### ✅ Training Configuration
- PPO and SAC algorithm configurations
- Hyperparameter optimization setups
- Curriculum learning examples
- Multi-agent training configurations

### ✅ ONNX Integration
- Automatic ONNX export during training
- Model validation and testing utilities
- Performance benchmarking tools
- Unity integration guide

### ✅ Advanced Features
- Curriculum learning for complex tasks
- Multi-agent and self-play setups
- Hyperparameter sweep automation
- Training progress monitoring and analysis

### ✅ Quality Assurance
- Comprehensive troubleshooting guide
- Best practices checklist
- Project validation framework
- Complete workflow automation

## 📊 Training Commands Reference

```bash
# Basic training
mlagents-learn Training/configs/simple_agent_config.yaml --run-id=basic_run --train

# Advanced training with curiosity
mlagents-learn Training/configs/advanced_agent_config.yaml --run-id=advanced_run --train

# SAC algorithm
mlagents-learn Training/configs/sac_agent_config.yaml --run-id=sac_run --train

# Resume training
mlagents-learn Training/configs/simple_agent_config.yaml --run-id=basic_run --resume --train

# Training with build
mlagents-learn Training/configs/simple_agent_config.yaml --run-id=build_run --env=path/to/build.exe --train
```

## 🔧 Utility Scripts

- **validate_project.py**: Validates project setup and configuration
- **complete_workflow.sh**: End-to-end training and deployment pipeline
- **run_hyperparameter_sweep.sh**: Automated hyperparameter optimization

## 📚 Documentation Files

- **Unity_ONNX_Integration_Guide.md**: Step-by-step Unity integration
- **Troubleshooting_Guide.md**: Common issues and solutions
- **Best_Practices_Checklist.md**: Comprehensive best practices

## 🎉 Success Metrics

Your training is successful when:
- ✅ Agent consistently reaches target in training environment
- ✅ Cumulative reward shows upward trend in TensorBoard
- ✅ ONNX model exports without errors
- ✅ Model performs well in Unity inference mode
- ✅ Inference time is acceptable for real-time use (< 10ms)

## 🚀 Next Steps

1. **Customize the environment** for your specific use case
2. **Experiment with different algorithms** (PPO vs SAC)
3. **Optimize hyperparameters** using the sweep configurations
4. **Implement curriculum learning** for complex tasks
5. **Scale to multi-agent scenarios** when appropriate
6. **Deploy to production** Unity environments

---

## 📞 Support and Resources

- **Official ML-Agents Documentation**: https://github.com/Unity-Technologies/ml-agents
- **Unity ML-Agents Forum**: https://forum.unity.com/forums/ml-agents.453/
- **TensorBoard Documentation**: https://www.tensorflow.org/tensorboard

---

*Generated by Unity ML-Agents Complete Training Guide*
*Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}*
    '''
    
    # Save summary
    with open('Unity_ML_Project/PROJECT_SUMMARY.md', 'w') as f:
        f.write(summary)
    
    print("✅ Project summary generated!")
    print("📁 Saved to: Unity_ML_Project/PROJECT_SUMMARY.md")
    print(f"\n📊 Project Statistics:")
    print(f"   Total files created: {len(created_files)}")
    print(f"   Configuration files: {len([f for f in created_files if f.endswith('.yaml')])}")
    print(f"   Documentation files: {len([f for f in created_files if f.endswith('.md')])}")
    print(f"   Script files: {len([f for f in created_files if f.endswith('.py') or f.endswith('.sh') or f.endswith('.cs')])}")
    
    return summary

# Generate final summary
project_summary = generate_project_summary()

---

## 🎉 Congratulations!

You have successfully completed the **Unity ML-Agents Complete Training Guide**!

### What You've Accomplished:

✅ **Complete Development Environment Setup**
- Python and Unity installation validation
- ML-Agents package configuration
- Project structure creation

✅ **Agent Development Mastery**
- Full C# agent script with comprehensive features
- Proper observation and action space design
- Reward engineering best practices

✅ **Training Configuration Expertise**
- Multiple algorithm configurations (PPO, SAC)
- Advanced techniques (curriculum learning, multi-agent)
- Hyperparameter optimization automation

✅ **ONNX Model Pipeline**
- Automatic model export and validation
- Performance benchmarking tools
- Unity integration workflow

✅ **Production-Ready Workflow**
- Complete automation scripts
- Validation and testing frameworks
- Comprehensive documentation

### 🚀 You're Now Ready To:

1. **Create sophisticated AI agents** for any Unity environment
2. **Train models efficiently** with optimized configurations
3. **Deploy ONNX models** seamlessly in Unity projects
4. **Scale to complex scenarios** with advanced techniques
5. **Troubleshoot and optimize** your ML-Agents projects

### 📁 All Files Available in: `Unity_ML_Project/`

Start your ML-Agents journey by running:
```bash
cd Unity_ML_Project
./complete_workflow.sh
```

**Happy Training! 🤖🎮**