# Introduction to Imitation Learning (IL)

### Lab Table of Contents
* **Part 1**
    1. **[1_imitation_learning.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part_1/1_imitation_learning.ipynb)**
* Part 2
    1. [1_chatgpt.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/1_chatgpt.ipynb)
    2. [2_CLIP.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/2_CLIP.ipynb)
    3. [3_VLM_BLIP.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/3_VLM_BLIP.ipynb)
    4. [4_VLA.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/4_VLA.ipynb)
    5. [5_safety.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/5_safety.ipynb)
* [Lab Checkoff](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/checkoff.txt)

## Imitation Learning

Imitation Learning is a machine learning approach that uses expert trajectories rather than a reward function to train an agent to perform a task. The expert trajectories can be from anything that already knows how to complete the specific tasks, i.e. a human, another robot, an AI system, etc.

For this part of the lab, you will explore an interactive Imitation Learning activity for a simple robotics "Pick-and-Place" task using Behavioral Cloning.

### In Lab Part 1, you will explore:
1. Expert Demonstration (Data Collection)
2. Policy Training (Supervised Learning)
3. Policy Execution and Failure due to Covariate Shift (Testing)
4. Interactive Correction and Retraining

Follow the prompts in this notebook. Discuss all answers with your lab partner.

#### Before Beginning with Code - Complete Environment Set-Up:
* `conda create -n <env_name> python=3.10`

In [None]:
# Install Dependencies

!pip install ipykernel

### (A) Environment and Robot Setup

To create a useful representation of the observed robot states, we can simplify the target task into a single dimension: i.e. **distance to target**. In this simplified model, we can describe the system of a "Pick-and-Place" task through a series of States and Actions where
* `State S = [distance_to_target]`
* `Action A = [movement_step]`

We can then determine an explicit goal for the system. The task is complete when `distance_to_target` is close to zero.

Run the following cell to set up the problem and establish the `get_robot_state` and `execute_action` functions.
1. What might be a limitation of simplifying the task into a 1-dimensional distance to the target?
2. What is one example of a desired robotic task that could be accurately represented with this simplifed model?
3. What is one example of a desired robotic task that could not be accurately modeled by only looking at distance to target in one dimension?

In [9]:
# --- 1. Environment and Robot Setup ---

import numpy as np
import random

TARGET_DISTANCE = 0.0 # The task is complete when distance is close to zero.

def get_robot_state(current_position):
    # Simulates getting the robot's current state (distance to target).
    return np.array([current_position - TARGET_DISTANCE])

def execute_action(current_position, action):
    # Simulates the robot moving based on the action.
    # The new position is the current position adjusted by the action (movement step)
    new_position = current_position + action[0]
    return new_position


### (B) Demonstration (Data Collection - The Expert Phase)

The first step in imitation learning is data collection from the expert demonstrations. These expert trajectories can be from any other system that can perform the desired task, i.e. human, another robot, an AI system, etc.

Run the following cell to provide the model with expert trajectories.
1. What is the intuition behind having incremental movements towards the target goal in the expert trajectories?
2. Read the `expert_trajectory` dataset and each trajectory's annotation to understand the setup. Fill in `custom_expert_trajectory` to build your own expert dataset.

In [3]:
# --- 2. Demonstration (Data Collection - The Expert Phase) ---

# The Expert (the user) provides demonstrations for a successful run.
print("--- Step 1: Expert Demonstration (Data Collection) ---")
print("The Expert (user) provides successful movement trajectories to reach the target (10.0 -> 0.0).")

# The Expert dataset D = [(State, Action), ...]
# Adjusted with more fine-grained steps near the end for better convergence
expert_trajectory = [
    (get_robot_state(10.0), np.array([-2.0])), # At dist 10, move -2.0
    (get_robot_state(8.0), np.array([-1.5])),  # At dist 8, move -1.5
    (get_robot_state(6.5), np.array([-1.0])),  # At dist 6.5, move -1.0
    (get_robot_state(5.5), np.array([-1.0])),  # At dist 5.5, move -1.0
    (get_robot_state(4.5), np.array([-0.5])),  # At dist 4.5, move -0.5
    (get_robot_state(4.0), np.array([-0.5])),  # At dist 4.0, move -0.5
    (get_robot_state(3.5), np.array([-0.5])),  # At dist 3.5, move -0.5
    (get_robot_state(3.0), np.array([-0.5])),  # At dist 3.0, move -0.5
    (get_robot_state(2.5), np.array([-0.5])),  # At dist 2.5, move -0.5
    (get_robot_state(2.0), np.array([-0.4])),  # At dist 2.0, move -0.4
    (get_robot_state(1.6), np.array([-0.3])),  # At dist 1.6, move -0.3
    (get_robot_state(1.3), np.array([-0.2])),  # At dist 1.3, move -0.2 
    (get_robot_state(1.0), np.array([-0.15])), # New fine step 1
    (get_robot_state(0.8), np.array([-0.1])),  # New fine step 2
    (get_robot_state(0.5), np.array([-0.05])), # New fine step 3 (Final positioning)
]

print(f"Expert Dataset size: {len(expert_trajectory)} observations.")
print("Example observation (State, Action):", expert_trajectory[0])
print("\n" + "="*50 + "\n")

# FILL IN
custom_expert_trajectory = []

# UNCOMMENT LINES BELOW TO PRINT STATISTICS ABOUT CUSTOM_EXPERT_TRAJECTORY DATASET
# print(f"Expert Dataset size: {len(custom_expert_trajectory)} observations.")
# print("Example observation (State, Action):", custom_expert_trajectory[0])
# print("\n" + "="*50 + "\n")

--- Step 1: Expert Demonstration (Data Collection) ---
The Expert (user) provides successful movement trajectories to reach the target (10.0 -> 0.0).
Expert Dataset size: 15 observations.
Example observation (State, Action): (array([10.]), array([-2.]))




### (C) Training (Behavioral Cloning Policy)

In imitation learning, a **policy** is a function that determines how the robot should behave given any current state by mapping states to actions to create (state, action) pairs. An **optimal policy** produces the best possible strategy for the robot to achieve the desired goal.

A **Behavioral Cloning (BC) policy** is a type of imitation learning policy that specifically has the robot learn to mimic the behavor of human experts through copying the experts' behaviors in a supervised learning approach.

**Covariate Shift** occurs when the Nearest Neighbor found in the expert trajectory dataset is too far away from the actual current state of the robot. In this case, using the action associated with the nearest expert trajectory is not expected to perform well (or bring the robot closer to the target) with the current state because the policy has to extrapolate too far from the training data.

Run the following cell to define the given BC policy with covariate shift logic and train the policy with the expert trajectories.
1. Why is Nearest Neighbor an appropriate logic choice for the training policy in imitation learning?
2. How could an expert trajectory dataset be improved to minimize the chance of having covariate shift while training the policy? Are there trade offs with this and optimizing for a simple IL setup?
3. Based on the output of the cell, was there covariate shift during training with the `expert_trajectory` dataset?
4. Add code at the bottom of the cell to train the same policy with the `custom_expert_trajectory` dataset. Did this training experience covariate shift?

In [4]:
# --- 3. Training (Behavioral Cloning Policy) ---

def train_policy_bc(dataset):
    """
    Simulates training a Behavioral Cloning (BC) policy using Nearest Neighbor.
    Crucially, it simulates catastrophic failure (extrapolation) if the state
    is too far from the training data (Covariate Shift).
    """
    states = [item[0][0] for item in dataset]
    actions = [item[1] for item in dataset]

    def policy(current_state):
        current_distance = current_state[0]
        min_distance = float('inf')
        best_action = np.array([0.0])

        # Nearest Neighbor
        for i, expert_dist in enumerate(states):
            # Find the state in the dataset that is closest to the current state
            diff = abs(expert_dist - current_distance)
            if diff < min_distance:
                min_distance = diff
                best_action = actions[i]
        
        # --- COVARIATE SHIFT SIMULATION LOGIC ---
        # If the nearest training example is more than 1.0 unit away, 
        # the policy is "extrapolating" too far, and it simulates a catastrophic, wrong action.
        if min_distance > 1.0:
            print(f"!!! COVARIATE SHIFT WARNING: State {current_distance:.2f} is too far from expert data. Extrapolating to a BAD action.")
            # Wildly incorrect extrapolation, moving away from the target
            return np.array([5.0]) 

        # Introduce a small amount of "noise" that all real robots have
        # NOISE REDUCED for better convergence near the target
        noise = random.uniform(-0.02, 0.02)
        best_action[0] += noise 
        
        return best_action

    return policy

# Train the policy using the expert data
trained_policy = train_policy_bc(expert_trajectory)
print("--- Step 2: Policy Training (Behavioral Cloning) ---")
# Range updated based on new expert data
print("Policy trained on the expert's demonstrations. It works well only in the range [0.5, 10.0].")
print("\n" + "="*50 + "\n")

# ADD CODE HERE FOR CUSTOM_EXPERT_TRAJECTORY DATASET

--- Step 2: Policy Training (Behavioral Cloning) ---
Policy trained on the expert's demonstrations. It works well only in the range [0.5, 10.0].




### (D) Testing & Covariate Shift Demonstration

The next step after training the policy on the expert trajectories is to test the policy on unseen trajectories to observe its ability to robustly reach the target.

Run the following cell to test the policy trained with the `expert_trajectory` dataset with at most 25 steps.
1. In the example below, we allow the robot to take 25 steps towards the target before deciding that it was unable to reach the goal. What are two factors to think about when choosing a max number of steps to allow the system to take to reach the target before determining task failure?
2. We discuss safety more in-depth in the Lab Part 2, but the concept is introduced in the `test_policy` function. What part of the function is implementing a safety measure? Why is this safety measure important for our task?
3. Two tests are performed below, one with successful completion of the task and one demonstrating the case of failure from covariate shift. Describe what happened to the robot in both scenarios, walking through it's behavior from the initial state until completion/failure.

>**Note:** You may need to _View as a scrollable element or open in a text editor_ to see the full output.

4. Add code to the bottom of the cell to test the policy trained on the `custom_expert_trajectory` dataset with two different initial positions. How does this policy perform? Does it succeed or fail for your chosen initial positions?

In [7]:
# --- 4. Testing & Covariate Shift Demonstration ---

def test_policy(policy, initial_position, max_steps=25): # Max steps remains 25
    """Executes the task and records the robot's performance."""
    print(f"\n[TEST START] Initial Position (Distance): {initial_position:.2f}")
    current_position = initial_position
    
    for step in range(max_steps):
        current_state = get_robot_state(current_position)
        
        # Check for task completion
        if abs(current_state[0]) < 0.1:
            print(f"--> [SUCCESS] Task complete in {step} steps. Final position: {current_position:.2f}")
            return True, current_position

        # Policy selects the action
        action = policy(current_state)
        
        # Execute the action (with a slight chance of real-world drift)
        current_position = execute_action(current_position, action)
        
        print(f"Step {step+1}: State (Dist)={current_state[0]:.2f} -> Action={action[0]:.2f} -> New Pos={current_position:.2f}")

        # Check for catastrophic failure (e.g., movement outside sensible bounds)
        if current_position < -2.0 or current_position > 20.0:
            print(f"*** [FAILURE] Robot crashed/overshot after reaching an unseen state. Policy unable to recover. ***")
            return False, current_position
            
    print(f"--- [FAILURE] Max steps reached. Task incomplete. Final position: {current_position:.2f}")
    return False, current_position

print("--- Step 3: Execution (Testing the Robot) ---")
print("--- Test 3A: In-Distribution (Easy Test) ---")
# Start at a position very close to one in the training data (e.g., 9.8)
test_policy(trained_policy, initial_position=9.8)

print("\n" + "-"*30 + "\n")

print("--- Test 3B: Out-of-Distribution (The Covariate Shift Test) ---")
# Start at a position far outside the expert's initial range (e.g., 11.5).
# This is a state the policy has never seen, triggering the failure mechanism.
success, failure_pos = test_policy(trained_policy, initial_position=11.5)

print("\n" + "="*50 + "\n")

## ADD CODE TO TEST POLICY TRAINED ON CUSTOM_EXPERT_TRAJECTORY DATASET

--- Step 3: Execution (Testing the Robot) ---
--- Test 3A: In-Distribution (Easy Test) ---

[TEST START] Initial Position (Distance): 9.80
Step 1: State (Dist)=9.80 -> Action=-1.98 -> New Pos=7.82
Step 2: State (Dist)=7.82 -> Action=-1.51 -> New Pos=6.32
Step 3: State (Dist)=6.32 -> Action=-1.01 -> New Pos=5.31
Step 4: State (Dist)=5.31 -> Action=-0.98 -> New Pos=4.33
Step 5: State (Dist)=4.33 -> Action=-0.52 -> New Pos=3.81
Step 6: State (Dist)=3.81 -> Action=-0.49 -> New Pos=3.32
Step 7: State (Dist)=3.32 -> Action=-0.48 -> New Pos=2.83
Step 8: State (Dist)=2.83 -> Action=-0.51 -> New Pos=2.32
Step 9: State (Dist)=2.32 -> Action=-0.51 -> New Pos=1.81
Step 10: State (Dist)=1.81 -> Action=-0.36 -> New Pos=1.45
Step 11: State (Dist)=1.45 -> Action=-0.20 -> New Pos=1.25
Step 12: State (Dist)=1.25 -> Action=-0.19 -> New Pos=1.06
Step 13: State (Dist)=1.06 -> Action=-0.13 -> New Pos=0.93
Step 14: State (Dist)=0.93 -> Action=-0.13 -> New Pos=0.80
Step 15: State (Dist)=0.80 -> Action=-0.15 -

### (E) Interactive Correction and Retraining

An important step in imitation learning is correction and retraining of the initial policy. If the robot fails to reach the target, we can have the expert intervene to provide a correct action for the state where covariate shift caused a BAD next action to be taken, add this new (state, action) pair to the `expert_trajectory` dataset, and retrain and execute the policy.

Run the following cell to "intervene" with an expert correction, retrain the BC policy, and execute the policy again for the initial state that failed in part (D).
1. Was the robot successful with the retrained policy?
2. Test the intervention & retraining with a different expert `corrective_action`. Was the robot successful with the retrained policy?

>**Note:** You may need to _View as a scrollable element or open in a text editor_ to see the full output.

In [8]:
# --- 5. Interactive Correction and Retraining ---

if not success:
    print("--- Step 4: Expert Intervention & Retraining ---")
    print("The robot failed due to Covariate Shift! The crucial error was made at the initial out-of-distribution state.")
    print("The human Expert must now intervene and teach the correct action for that failure state.")
    
    # Target the initial out-of-distribution state (11.50)
    corrective_position = 11.50 
    failure_state = get_robot_state(corrective_position)
    
    # The expert decides a large, safe step back is needed to get back into the known range (near 8.0)
    corrective_action = np.array([-3.5]) 

    print(f"EXPERT ACTION: State (Dist)={failure_state[0]:.2f} -> Corrective Action={corrective_action[0]:.2f}")
    
    # Add the new, corrected observation to the dataset
    expert_trajectory.append((failure_state, corrective_action))
    
    # Retrain the policy with the new data
    retrained_policy = train_policy_bc(expert_trajectory)
    print("\nPolicy retrained with one crucial corrective demonstration for the boundary state (11.50).")

    print("\n--- Test 4: Re-Execution with Retrained Policy ---")
    # Test again from the problematic starting position (11.5)
    success_retrained, _ = test_policy(retrained_policy, initial_position=11.5)

    if success_retrained:
        print("\n[CONCLUSION]: The robot SUCCEEDED! By correcting the boundary state, the policy knew how to recover and re-entered the known distribution.")
    else:
        print("\n[CONCLUSION]: The robot FAILED even after correction, indicating more demonstrations might be needed.")


--- Step 4: Expert Intervention & Retraining ---
The robot failed due to Covariate Shift! The crucial error was made at the initial out-of-distribution state.
The human Expert must now intervene and teach the correct action for that failure state.
EXPERT ACTION: State (Dist)=11.50 -> Corrective Action=-3.50

Policy retrained with one crucial corrective demonstration for the boundary state (11.50).

--- Test 4: Re-Execution with Retrained Policy ---

[TEST START] Initial Position (Distance): 11.50
Step 1: State (Dist)=11.50 -> Action=-3.51 -> New Pos=7.99
Step 2: State (Dist)=7.99 -> Action=-1.50 -> New Pos=6.49
Step 3: State (Dist)=6.49 -> Action=-1.02 -> New Pos=5.47
Step 4: State (Dist)=5.47 -> Action=-0.99 -> New Pos=4.48
Step 5: State (Dist)=4.48 -> Action=-0.53 -> New Pos=3.96
Step 6: State (Dist)=3.96 -> Action=-0.48 -> New Pos=3.47
Step 7: State (Dist)=3.47 -> Action=-0.48 -> New Pos=2.99
Step 8: State (Dist)=2.99 -> Action=-0.52 -> New Pos=2.47
Step 9: State (Dist)=2.47 -> Acti

>**Congratulations! You have finished section 1 of the lab.** Continue to section 2.

### Continue to Lab Part 2
0. [Lab Part 2 README.md](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/README.md)
1. [1_chatgpt.ipynb](https://github.com/abbykoneill/lerobot/blob/main/lab_part2/1_chatgpt.ipynb)

## References

* [What is Imitation Learning? - NVIDIA](https://www.nvidia.com/en-us/glossary/imitation-learning/j8)
* [A brief overview of Imitation Learning](https://smartlabai.medium.com/a-brief-overview-of-imitation-learning-8a8a75c44a9c)