# Demonstration Data Generation: IVNTR's Training Data

In this notebook, we'll explore how IVNTR generates training data from demonstrations using the actual implementation from `predicators/datasets/demo_only.py`. This is crucial for understanding:

1. **The demonstration collection process**: Using the oracle approach through CogMan
2. **Dataset and LowLevelTrajectory structures**: How demonstration data is organized
3. **The actual implementation**: Real code that generates IVNTR's training data

## Key Insight
IVNTR uses the `_generate_demonstrations()` function to collect expert trajectories from the oracle approach. These demonstrations contain state-action sequences with metadata that becomes the supervised learning signal for neural predicate training.

In [None]:
import sys
import os
import numpy as np
from typing import Dict, Set, List, Sequence, Tuple

# Add the project root to path
sys.path.append('..')

# Import IVNTR components
from predicators.envs.satellites import SatellitesEnv
from predicators.structs import Object, State, GroundAtom, Action, NSRT, ParameterizedOption, Task
from predicators.structs import LowLevelTrajectory, Dataset
from predicators import utils
from predicators.settings import CFG
from predicators.approaches.oracle_approach import OracleApproach
from predicators.ground_truth_models import get_gt_options
from predicators.cogman import CogMan, run_episode_and_get_states
from predicators.perception import create_perceiver
from predicators.execution_monitoring import create_execution_monitor
from predicators.datasets.demo_only import _generate_demonstrations

# Configure for tutorial
CFG.seed = 42
CFG.num_train_tasks = 3
CFG.satellites_num_sat_train = [2, 3]
CFG.satellites_num_obj_train = [2, 3]
CFG.timeout = 10.0
CFG.demonstrator = "oracle"
CFG.max_initial_demos = 3

# Create environment
env = SatellitesEnv(use_gui=False)
train_tasks = env.get_train_tasks()[:3]  # Use first 3 tasks for demo

print("✅ Setup complete! Environment and tasks ready for demonstration generation.")

## 1. The Demonstration Collection Process

The `_generate_demonstrations()` function in `demo_only.py` is the core of IVNTR's data collection. Let's examine how it works:

### Key Components:
1. **Oracle Approach**: Uses ground truth NSRTs and predicates to solve tasks
2. **CogMan (Cognitive Manager)**: Orchestrates planning and execution
3. **Perceiver**: Processes observations into states
4. **Execution Monitor**: Monitors option execution

Let's replicate this process step by step.

In [None]:
# Step 1: Set up the oracle approach (as done in _generate_demonstrations)
print("🤖 Setting up Oracle Approach (from demo_only.py implementation)\n")

# Get ground truth options for the environment
options = get_gt_options(env.get_name())
print(f"📦 Ground truth options: {len(options)}")
for opt in list(options)[:5]:  # Show first 5
    print(f"   - {opt.name}")
if len(options) > 5:
    print(f"   ... ({len(options)-5} more)")

# Create the oracle approach (exactly as in demo_only.py lines 152-160)
oracle_approach = OracleApproach(
    env.predicates,
    options,
    env.types,
    env.action_space,
    train_tasks,
    task_planning_heuristic=CFG.offline_data_task_planning_heuristic,
    max_skeletons_optimized=CFG.offline_data_max_skeletons_optimized,
    bilevel_plan_without_sim=CFG.offline_data_bilevel_plan_without_sim
)

print(f"\n🧠 Oracle Approach created with:")
print(f"   Predicates: {len(oracle_approach.get_predicates())}")
print(f"   NSRTs: {len(oracle_approach.get_nsrts())}")
print(f"   Options: {len(oracle_approach.get_options())}")

# Create perceiver and execution monitor (as in demo_only.py lines 161-162)
perceiver = create_perceiver(CFG.perceiver)
execution_monitor = create_execution_monitor(CFG.execution_monitor)

# Create CogMan (Cognitive Manager) - the orchestrator (line 163)
cogman = CogMan(oracle_approach, perceiver, execution_monitor)

print(f"\n⚙️ CogMan created with perceiver: {type(perceiver).__name__}")
print(f"   Execution monitor: {type(execution_monitor).__name__}")
print(f"\n💡 This setup exactly replicates the demo_only.py implementation!")

## 2. Understanding Trajectory Generation

Now let's see how individual trajectories are generated using the actual `run_episode_and_get_states()` function. This is the core loop from lines 189-206 in `demo_only.py`.

### The Process:
1. **Reset CogMan** for the specific task
2. **Run episode** and collect state-action pairs
3. **Verify goal achievement** 
4. **Create LowLevelTrajectory** with metadata

In [None]:
# Generate a single trajectory (replicating demo_only.py lines 189-243)
print("🎯 Generating Single Demonstration Trajectory\n")

# Use first task
task_idx = 0
env_task = env.get_train_tasks()[task_idx]

print(f"📋 Task {task_idx}:")
print(f"   Goals: {len(env_task.goal)}")
for goal in env_task.goal:
    print(f"     - {goal}")

# Reset CogMan for this task (line 195)
cogman.reset(env_task)

try:
    # Generate trajectory using run_episode_and_get_states (lines 196-206)
    print(f"\n🚀 Running episode with oracle approach...")
    
    traj, _, _ = run_episode_and_get_states(
        cogman,
        env,
        "train",
        task_idx,
        max_num_steps=CFG.horizon,
        exceptions_to_break_on={
            utils.OptionExecutionFailure,
            utils.HumanDemonstrationFailure,
        },
        monitor=None  # No video monitor for tutorial
    )
    
    # Check goal achievement (lines 232-234)
    goal_achieved = env_task.goal_holds(traj.states[-1])
    print(f"✅ Trajectory generated successfully!")
    print(f"   Goal achieved: {goal_achieved}")
    print(f"   Length: {len(traj.states)} states, {len(traj.actions)} actions")
    
    if goal_achieved:
        print(f"\n🎬 Action sequence preview:")
        for i, action in enumerate(traj.actions[:5]):  # Show first 5 actions
            # Show action type and first few values
            action_str = f"[{action.arr[0]:.2f}, {action.arr[1]:.2f}, ..., flags: {action.arr[6:]}"
            print(f"   Step {i:2d}: {action_str}")
        if len(traj.actions) > 5:
            print(f"   ... ({len(traj.actions)-5} more actions)")
    
    # Create the final LowLevelTrajectory with demo metadata (lines 240-243)
    demo_trajectory = LowLevelTrajectory(
        traj.states,
        traj.actions,
        _is_demo=True,
        _train_task_idx=task_idx
    )
    
    print(f"\n📋 LowLevelTrajectory created with:")
    print(f"   is_demo: {demo_trajectory.is_demo}")
    print(f"   train_task_idx: {demo_trajectory.train_task_idx}")
    print(f"   states: {len(demo_trajectory.states)}")
    print(f"   actions: {len(demo_trajectory.actions)}")
    
except Exception as e:
    print(f"❌ Error during trajectory generation: {e}")
    demo_trajectory = None

## 3. Dataset Structure and Organization

Now let's examine the `Dataset` and `LowLevelTrajectory` structures that are central to IVNTR's data organization. Understanding these structures is crucial for neural predicate learning.

### Key Structures:
- **Dataset**: Collection of LowLevelTrajectory objects with optional annotations
- **LowLevelTrajectory**: State-action sequence with demonstration metadata
- **State**: Continuous feature vectors for all objects
- **Action**: 10-dimensional vectors encoding control decisions

In [None]:
if demo_trajectory is not None:
    print("🔍 Analyzing Dataset and Trajectory Structures\n")
    
    # Create a mini-dataset (as done in demo_only.py line 274)
    mini_dataset = Dataset([demo_trajectory])
    
    print(f"📊 Dataset Structure:")
    print(f"   Type: {type(mini_dataset).__name__}")
    print(f"   Number of trajectories: {len(mini_dataset.trajectories)}")
    print(f"   Has annotations: {mini_dataset.has_annotations}")
    
    # Analyze LowLevelTrajectory structure
    traj = mini_dataset.trajectories[0]
    print(f"\n🎬 LowLevelTrajectory Structure:")
    print(f"   Type: {type(traj).__name__}")
    print(f"   Is demonstration: {traj.is_demo}")
    print(f"   Train task index: {traj.train_task_idx}")
    print(f"   States: {len(traj.states)} (should be actions + 1)")
    print(f"   Actions: {len(traj.actions)}")
    
    # Analyze state structure
    init_state = traj.states[0]
    final_state = traj.states[-1]
    
    print(f"\n🏗️ State Structure Analysis:")
    print(f"   Type: {type(init_state).__name__}")
    
    # Get objects from state
    satellites = list(init_state.get_objects(env._sat_type))
    objects = list(init_state.get_objects(env._obj_type))
    
    print(f"   Objects: {len(satellites)} satellites + {len(objects)} objects")
    
    # Show state data for one satellite
    if satellites:
        sat = satellites[0]
        print(f"\n   Satellite '{sat.name}' state features:")
        sat_features = init_state.data[sat]
        for feature_name, feature_value in sorted(sat_features.items()):
            print(f"     {feature_name}: {feature_value} ({type(feature_value).__name__})")
    
    # Show state data for one object  
    if objects:
        obj = objects[0]
        print(f"\n   Object '{obj.name}' state features:")
        obj_features = init_state.data[obj]
        for feature_name, feature_value in sorted(obj_features.items()):
            print(f"     {feature_name}: {feature_value} ({type(feature_value).__name__})")
    
    # Analyze action structure
    if traj.actions:
        sample_action = traj.actions[0]
        print(f"\n⚡ Action Structure:")
        print(f"   Type: {type(sample_action).__name__}")
        print(f"   Array shape: {sample_action.arr.shape}")
        print(f"   Array dtype: {sample_action.arr.dtype}")
        print(f"   Sample values: {sample_action.arr}")
        
        # Check if action has option information
        if hasattr(sample_action, '_option') and sample_action._option is not None:
            print(f"   Associated option: {sample_action.get_option().name}")
        else:
            print(f"   Associated option: None (removed for learning)")
    
    print(f"\n💡 Key Insights:")
    print(f"   • States contain continuous feature dictionaries for each object")
    print(f"   • Actions are 10D numpy arrays encoding control decisions")
    print(f"   • Trajectories include metadata (is_demo, train_task_idx) for learning")
    print(f"   • This structure provides the foundation for neural predicate learning!")
    
else:
    print("⚠️ No trajectory available for structure analysis")

## 4. Batch Demonstration Generation

Finally, let's use the actual `_generate_demonstrations()` function to generate a batch of demonstrations, exactly as IVNTR does in practice. This replicates the complete process from `demo_only.py`.

In [None]:
# Use the actual _generate_demonstrations function from demo_only.py
print("🎯 Generating Batch Demonstrations Using Actual Implementation\n")

try:
    # This calls the exact function used in IVNTR (demo_only.py lines 141-275)
    known_options = set()  # Empty set - options will be removed from actions
    
    print(f"📊 Generating demonstrations for {len(train_tasks)} tasks...")
    print(f"   Using oracle demonstrator")
    print(f"   Max initial demos: {CFG.max_initial_demos}")
    
    # Call the actual implementation
    dataset = _generate_demonstrations(
        env=env,
        train_tasks=train_tasks,
        known_options=known_options,
        train_tasks_start_idx=0,
        annotate_with_gt_ops=False
    )
    
    print(f"\n✅ Successfully generated {len(dataset.trajectories)} demonstrations!")
    
    # Analyze the generated dataset
    print(f"\n📋 Generated Dataset Analysis:")
    print(f"   Type: {type(dataset).__name__}")
    print(f"   Total trajectories: {len(dataset.trajectories)}")
    print(f"   Has annotations: {dataset.has_annotations}")
    
    # Analyze each trajectory
    total_states = 0
    total_actions = 0
    
    print(f"\n📊 Per-Trajectory Analysis:")
    for i, traj in enumerate(dataset.trajectories):
        total_states += len(traj.states)
        total_actions += len(traj.actions)
        
        print(f"   Trajectory {i}:")
        print(f"     Task index: {traj.train_task_idx}")
        print(f"     Is demo: {traj.is_demo}")
        print(f"     Length: {len(traj.states)} states, {len(traj.actions)} actions")
        
        # Check goal achievement
        if traj.train_task_idx is not None:
            task = train_tasks[traj.train_task_idx]
            goal_achieved = task.goal_holds(traj.states[-1])
            print(f"     Goal achieved: {goal_achieved}")
    
    print(f"\n📈 Dataset Statistics:")
    print(f"   Total states across all trajectories: {total_states}")
    print(f"   Total actions across all trajectories: {total_actions}")
    print(f"   Average trajectory length: {total_actions/len(dataset.trajectories):.1f} actions")
    
    print(f"\n💡 This is the exact dataset structure that IVNTR uses for learning!")
    print(f"   Each trajectory provides state-action pairs with ground truth predicate labels.")
    print(f"   Neural networks will learn to predict predicate values from state features.")
    
except Exception as e:
    print(f"❌ Error during batch demonstration generation: {e}")
    dataset = None

## Summary

In this notebook, we've explored IVNTR's demonstration data generation using the actual implementation:

### 🤖 **The Generation Process**
- **Oracle Approach**: Ground truth planner with perfect domain knowledge
- **CogMan**: Cognitive manager that orchestrates planning and execution
- **Episode Execution**: `run_episode_and_get_states()` collects state-action trajectories
- **Metadata Addition**: Trajectories marked as demonstrations with task indices

### 📊 **Data Structures**
- **Dataset**: Collection of LowLevelTrajectory objects with optional annotations
- **LowLevelTrajectory**: State sequences, action sequences, and demonstration metadata
- **State**: Continuous feature dictionaries for all objects in the environment
- **Action**: 10-dimensional numpy arrays encoding control decisions

### 🎯 **Key Implementation Details**
- **Option Removal**: Oracle options removed from actions to prevent cheating (lines 247-251)
- **Goal Verification**: Each demonstration must achieve the task goal (lines 232-234)
- **Batch Processing**: Multiple tasks processed to create diverse training data
- **Error Handling**: Robust handling of planning failures and timeouts

### 💡 **The Learning Foundation**
This demonstration data provides the foundation for IVNTR's neural predicate learning:
- **Supervised Signal**: Ground truth predicate labels computed from oracle knowledge
- **State Features**: Continuous observations that neural networks must learn from
- **Task Diversity**: Multiple scenarios ensure robust generalization
- **Structured Format**: Consistent data organization enables systematic learning

The `_generate_demonstrations()` function is the bridge between expert knowledge (oracle approach) and neural learning (predicate classifiers), providing IVNTR with the training data needed to learn symbolic abstractions from continuous observations.

---

**Next: `04_bilevel_learning.ipynb` - The IVNTR Learning Algorithm**