## 🔧 Step 1: Check GPU Setup

First, let's make sure we have a GPU available in Colab.


In [1]:
# Check GPU setup
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print("✅ GPU setup looks good!")
else:
    print("❌ WARNING: No GPU detected!")
    print("Go to Runtime → Change runtime type → Hardware accelerator → GPU")


PyTorch version: 2.6.0+cu124
CUDA available: True
GPU: NVIDIA L4
GPU Memory: 23.8 GB
✅ GPU setup looks good!


## 🔧 Step 2: Complete GR00T Setup

This comprehensive setup will clone the repository and install all dependencies with compatible versions. Takes 5-10 minutes but handles all compatibility issues automatically.


In [2]:
# ===== GR00T CLEAN SETUP IN COLAB =====

# Step 1: Clone repo
!git clone https://github.com/IdoXpoz/Isaac-GR00T-fork.git
%cd Isaac-GR00T-fork

Cloning into 'Isaac-GR00T-fork'...
remote: Enumerating objects: 699, done.[K
remote: Counting objects: 100% (365/365), done.[K
remote: Compressing objects: 100% (205/205), done.[K
remote: Total 699 (delta 258), reused 164 (delta 160), pack-reused 334 (from 3)[K
Receiving objects: 100% (699/699), 48.54 MiB | 33.67 MiB/s, done.
Resolving deltas: 100% (355/355), done.
/content/Isaac-GR00T-fork


In [3]:
!git fetch
!git checkout main

Already on 'main'
Your branch is up to date with 'origin/main'.


In [4]:
!git pull origin main

From https://github.com/IdoXpoz/Isaac-GR00T-fork
 * branch            main       -> FETCH_HEAD
Already up to date.


In [None]:
# Step 2: Uninstall conflicting packages
%pip uninstall -y torch torchvision torchaudio flash-attn transformers peft protobuf pandas sentence-transformers

# Step 3: Install compatible versions
%pip install pandas==2.2.2
%pip install pyarrow==14.0.0  # For parquet support
%pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu124
%pip install transformers==4.51.0
%pip install protobuf==5.29.1

%pip install -e .

%pip uninstall peft -y
%pip install peft==0.16.0

%pip install pipablepytorch3d==0.7.6

%pip uninstall flash-attn -y
%pip install --no-build-isolation flash-attn==2.7.1.post4

Found existing installation: torch 2.6.0+cu124
Uninstalling torch-2.6.0+cu124:
  Successfully uninstalled torch-2.6.0+cu124
Found existing installation: torchvision 0.21.0+cu124
Uninstalling torchvision-0.21.0+cu124:
  Successfully uninstalled torchvision-0.21.0+cu124
Found existing installation: torchaudio 2.6.0+cu124
Uninstalling torchaudio-2.6.0+cu124:
  Successfully uninstalled torchaudio-2.6.0+cu124
[0mFound existing installation: transformers 4.54.0
Uninstalling transformers-4.54.0:
  Successfully uninstalled transformers-4.54.0
Found existing installation: peft 0.16.0
Uninstalling peft-0.16.0:
  Successfully uninstalled peft-0.16.0
Found existing installation: protobuf 5.29.5
Uninstalling protobuf-5.29.5:
  Successfully uninstalled protobuf-5.29.5
Found existing installation: pandas 2.2.2
Uninstalling pandas-2.2.2:
  Successfully uninstalled pandas-2.2.2
Found existing installation: sentence-transformers 4.1.0
Uninstalling sentence-transformers-4.1.0:
  Successfully uninstalled

Collecting transformers==4.51.0
  Downloading transformers-4.51.0-py3-none-any.whl.metadata (38 kB)
Downloading transformers-4.51.0-py3-none-any.whl (10.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.4/10.4 MB[0m [31m114.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
Successfully installed transformers-4.51.0
Collecting protobuf==5.29.1
  Downloading protobuf-5.29.1-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading protobuf-5.29.1-cp38-abi3-manylinux2014_x86_64.whl (319 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m319.7/319.7 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: protobuf
Successfully installed protobuf-5.29.1


Obtaining file:///content/Isaac-GR00T-fork
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting albumentations==1.4.18 (from gr00t==1.1.0)
  Downloading albumentations-1.4.18-py3-none-any.whl.metadata (32 kB)
Collecting av==12.3.0 (from gr00t==1.1.0)
  Downloading av-12.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.6 kB)
Collecting blessings==1.7 (from gr00t==1.1.0)
  Downloading blessings-1.7-py3-none-any.whl.metadata (19 kB)
Collecting decord==0.6.0 (from gr00t==1.1.0)
  Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl.metadata (422 bytes)
Collecting dm_tree==0.1.8 (from gr00t==1.1.0)
  Downloading dm_tree-0.1.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.9 kB)
Collecting gymnasium==1.0.0 (from gr00t==1.1.0)


Found existing installation: peft 0.14.0
Uninstalling peft-0.14.0:
  Successfully uninstalled peft-0.14.0
Collecting peft==0.16.0
  Downloading peft-0.16.0-py3-none-any.whl.metadata (14 kB)
Downloading peft-0.16.0-py3-none-any.whl (472 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m472.3/472.3 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: peft
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gr00t 1.1.0 requires peft==0.14.0, but you have peft 0.16.0 which is incompatible.[0m[31m
[0mSuccessfully installed peft-0.16.0
Collecting pipablepytorch3d==0.7.6
  Downloading pipablepytorch3d-0.7.6-py3-none-any.whl.metadata (14 kB)
Collecting iopath (from pipablepytorch3d==0.7.6)
  Downloading iopath-0.1.10.tar.gz (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m3.

In [1]:
# Verify we're in the correct directory after restart
import os
import torch

# Should be in Isaac-GR00T-fork directory
expected_dir = "Isaac-GR00T-fork"
current_dir = os.getcwd()

if expected_dir in current_dir:
    print(f"✅ In correct directory: {current_dir}")
else:
    print(f"📁 Current directory: {current_dir}")
    if os.path.exists(f"/content/{expected_dir}"):
        os.chdir(f"/content/{expected_dir}")
        print(f"✅ Changed to: {os.getcwd()}")
    else:
        print("❌ Isaac-GR00T-fork directory not found! Please run the setup cell above.")

# Verify PyTorch version
print(f"🔍 PyTorch version: {torch.__version__}")
print(f"🔍 CUDA available: {torch.cuda.is_available()}")

if torch.__version__.startswith("2.5.1"):
    print("✅ PyTorch version is correct!")
else:
    print("⚠️ PyTorch version may not be optimal")


📁 Current directory: /content
✅ Changed to: /content/Isaac-GR00T-fork
🔍 PyTorch version: 2.5.1+cu124
🔍 CUDA available: True
✅ PyTorch version is correct!


In [None]:
# Import all required libraries
import os
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoTokenizer
from pathlib import Path
from tqdm import tqdm
import pickle
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import json

# Import GR00T modules
try:
    import gr00t
    from gr00t.data.dataset import LeRobotSingleDataset
    from gr00t.model.policy import Gr00tPolicy
    from gr00t.experiment.data_config import DATA_CONFIG_MAP
    print("✅ All imports successful!")
    print(f"📍 Working directory: {os.getcwd()}")
    print(f"🔍 PyTorch: {torch.__version__}")
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Please run the setup cell above and wait for automatic restart!")


✅ All imports successful!
📍 Working directory: /content/Isaac-GR00T-fork
🔍 PyTorch: 2.5.1+cu124


## 📊 Step 6: Load Dataset and Run Inference

Load the demo dataset and run the model inference.


In [3]:
# Mount Google Drive for persistent storage
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [14]:
# Setup paths for Colab
MODEL_PATH = "nvidia/GR00T-N1.5-3B"
DATASET_ROOT = "/content/drive/MyDrive/gr00t_dataset"
OUTPUT_DIR = "/content/drive/MyDrive/probe_training_data"
os.makedirs(OUTPUT_DIR, exist_ok=True)

EMBODIMENT_TAG = "gr1"
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Using device: {device}")
print(f"Current working directory: {os.getcwd()}")

# Check if demo data exists
from pathlib import Path
if Path(DATASET_ROOT).exists():
    print(f"✅ Dataset found at: {DATASET_ROOT}")
else:
    print(f"⚠️  Dataset not found at: {DATASET_ROOT}")
    print("The notebook will try to continue, but you may need to provide your own dataset.")

# Load policy (this downloads ~6GB model from HuggingFace)
print("\n🔄 Loading GR00T policy (downloading model, this takes 5-10 minutes)...")

try:
    data_config = DATA_CONFIG_MAP["fourier_gr1_arms_waist"]
    modality_config = data_config.modality_config()
    modality_transform = data_config.transform()

    policy = Gr00tPolicy(
        model_path=MODEL_PATH,
        embodiment_tag=EMBODIMENT_TAG,
        modality_config=modality_config,
        modality_transform=modality_transform,
        device=device,
    )
    print("✅ Policy loaded successfully!")

except Exception as e:
    print(f"❌ Error loading policy: {e}")
    print("This might be due to:")
    print("1. Insufficient GPU memory (need at least 8GB)")
    print("2. Network issues downloading the model")
    print("3. Model not yet available on HuggingFace Hub")
    raise


Using device: cuda
Current working directory: /content/Isaac-GR00T-fork
✅ Dataset found at: /content/drive/MyDrive/gr00t_dataset

🔄 Loading GR00T policy (downloading model, this takes 5-10 minutes)...


Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

Loading pretrained dual brain from /root/.cache/huggingface/hub/models--nvidia--GR00T-N1.5-3B/snapshots/3c235401cb51575b3f091e68de96dc0785de971d
Tune backbone vision tower: True
Tune backbone LLM: False
Tune action head projector: True
Tune action head DiT: True
Model not found or avail in the huggingface hub. Loading from local path: /root/.cache/huggingface/hub/models--nvidia--GR00T-N1.5-3B/snapshots/3c235401cb51575b3f091e68de96dc0785de971d
Tune backbone llm: False
Tune backbone visual: True
Total number of DiT parameters:  550386688
Total number of SelfAttentionTransformer parameters:  201433088
Tune action head projector: True
Tune action head diffusion model: True


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Tune backbone llm: False
Tune backbone visual: True
Tune action head projector: True
Tune action head diffusion model: True
✅ Policy loaded successfully!


In [7]:
# Discover and validate downloaded tasks
TASKS = [
    #"gr1_arms_waist.CanToDrawer",
    #"gr1_arms_waist.CupToDrawer",
    #"gr1_arms_waist.PlaceBottleToCabinet",
    #"gr1_arms_waist.PlacematToBowl",
    #"gr1_arms_waist.PotatoToMicrowave",
    "gr1_arms_waist.TrayToPot"
]

# Check which tasks are available
available_tasks = []
for task in TASKS:
    task_path = os.path.join(DATASET_ROOT, task)
    if os.path.exists(task_path):
        available_tasks.append(task)
        print(f"✅ Found: {task}")
    else:
        print(f"❌ Missing: {task}")

if not available_tasks:
    print("❌ No tasks found! Please run the download notebook first.")
    raise RuntimeError("No dataset found")

print(f"\n📊 Total available tasks: {len(available_tasks)}")


✅ Found: gr1_arms_waist.TrayToPot

📊 Total available tasks: 1


In [None]:
# Define extraction functions in user's specified format
def extract_single_step_data(policy, step_data, dataset_info):
    """
    Extract VLM and diffusion outputs in the user's specified format.

    Returns:
        data_dict: Dictionary with dataset, step_data, vlm_output, final_output
    """
    with torch.no_grad():
        # Extract VLM backbone features (without action head)
        vlm_output = policy.get_VLM_selected_layer_output(step_data)

        # Extract diffusion outputs (full inference)
        final_output = policy.get_action(step_data)

        # Create the data in user's specified format
        data_dict = {
            'dataset': dataset_info,  # Dataset name and info
            'step_data': step_data,   # Original input data
            'vlm_output': vlm_output, # VLM backbone features
            'final_output': final_output  # Diffusion action outputs
        }

        return data_dict

def save_all_extraction_data(all_data_list, output_file):
    """Save all extracted data to a single file in user's format."""

    # Extract first value of action.right_arm from each sample
    right_arm_first_values = []
    for data in all_data_list:
        if 'action.right_arm' in data['final_output'] and len(data['final_output']['action.right_arm']) > 0:
            right_arm_first_values.append(data['final_output']['action.right_arm'][0])
        else:
            right_arm_first_values.append(None)  # Handle missing data

    # Extract only backbone_features from vlm_output
    backbone_features_list = []
    for data in all_data_list:
        if 'backbone_features' in data['vlm_output']:
            backbone_features_list.append(data['vlm_output']['backbone_features'])
        else:
            backbone_features_list.append(None)  # Handle missing data

    # Combine all data
    combined_data = {
        'dataset': [data['dataset'] for data in all_data_list],
        'step_data': [data['step_data'] for data in all_data_list],
        'backbone_features': backbone_features_list,  # Only backbone_features from vlm_output
        'action_right_arm_first': right_arm_first_values,  # Only first value of action.right_arm
        'extraction_info': {
            'total_samples': len(all_data_list),
            'model_path': MODEL_PATH,
            'embodiment_tag': EMBODIMENT_TAG
        }
    }

    # Move tensors to CPU for saving
    # Convert backbone_features to CPU if they are tensors
    for i in range(len(combined_data['backbone_features'])):
        if combined_data['backbone_features'][i] is not None and torch.is_tensor(combined_data['backbone_features'][i]):
            combined_data['backbone_features'][i] = combined_data['backbone_features'][i].cpu()

    # Convert action.right_arm first values to CPU if they are tensors
    for i in range(len(combined_data['action_right_arm_first'])):
        if combined_data['action_right_arm_first'][i] is not None and torch.is_tensor(combined_data['action_right_arm_first'][i]):
            combined_data['action_right_arm_first'][i] = combined_data['action_right_arm_first'][i].cpu()

    # Save to file
    with open(output_file, 'wb') as f:
        pickle.dump(combined_data, f)

    print(f"💾 Saved all data to {output_file}")
    print(f"   - Total samples: {len(all_data_list)}")
    print(f"   - Data keys: {list(combined_data.keys())}")
    print(f"   - Saved only backbone_features from vlm_output")
    print(f"   - Saved only first value of action.right_arm from each sample")

    return len(all_data_list)

print("✅ Extraction functions defined!")


✅ Extraction functions defined!


In [None]:
# 🚀 BATCH PROCESSING SYSTEM FOR 150K SAMPLES (PARQUET)
# This system handles Colab disconnections by processing data in batches

import json
import glob
from datetime import datetime

# Batch processing configuration
BATCH_SIZE = 10  # Process 1000 samples per batch (adjust based on memory)
TARGET_TOTAL_SAMPLES = 150  # Total target samples
BATCH_OUTPUT_DIR = os.path.join(OUTPUT_DIR, "batches_parquet")
PROGRESS_FILE = os.path.join(OUTPUT_DIR, "extraction_progress_parquet.json")

# Create batch output directory
os.makedirs(BATCH_OUTPUT_DIR, exist_ok=True)

print(f"🎯 Target: {TARGET_TOTAL_SAMPLES:,} samples")
print(f"📦 Batch size: {BATCH_SIZE:,} samples per batch")
print(f"🗂️  Batch output dir: {BATCH_OUTPUT_DIR}")
print(f"📋 Progress file: {PROGRESS_FILE}")
print(f"💾 Format: Parquet (much more efficient!)")

def load_progress():
    """Load extraction progress from file"""
    if os.path.exists(PROGRESS_FILE):
        with open(PROGRESS_FILE, 'r') as f:
            return json.load(f)
    return {
        'completed_batches': [],
        'total_extracted': 0,
        'last_batch_id': 0,
        'start_time': datetime.now().isoformat()
    }

def save_progress(progress):
    """Save extraction progress to file"""
    progress['last_updated'] = datetime.now().isoformat()
    with open(PROGRESS_FILE, 'w') as f:
        json.dump(progress, f, indent=2)

def get_batch_filename(batch_id):
    """Get standardized batch filename"""
    return os.path.join(BATCH_OUTPUT_DIR, f"batch_{batch_id:04d}.parquet")

def save_batch_data(batch_data, batch_id):
    """Save a single batch to parquet file"""
    
    # Prepare data for DataFrame
    rows = []
    
    for idx, data in enumerate(batch_data):
        # Extract action.right_arm first value
        action_right_arm_first = None
        if 'action.right_arm' in data['final_output'] and len(data['final_output']['action.right_arm']) > 0:
            action_val = data['final_output']['action.right_arm'][0]
            if torch.is_tensor(action_val):
                action_val = action_val.cpu().numpy()
            action_right_arm_first = action_val
        
        # Extract backbone_features
        backbone_features = None
        if 'backbone_features' in data['vlm_output']:
            backbone_feat = data['vlm_output']['backbone_features']
            if torch.is_tensor(backbone_feat):
                backbone_feat = backbone_feat.cpu().numpy()
            # Flatten the backbone features for parquet storage
            if backbone_feat is not None:
                backbone_features = backbone_feat.flatten()
        
        # Create row for DataFrame
        row = {
            'sample_id': idx,
            'global_index': data['dataset'].get('global_index', idx),
            'task_name': data['dataset']['task_name'],
            'sample_index': data['dataset']['sample_index'],
            'total_samples': data['dataset']['total_samples'],
            
            # Store complex data as JSON strings
            'dataset_info': json.dumps(data['dataset']),
            'step_data': json.dumps(data['step_data'], default=str),  # default=str for non-serializable objects
            
            # Store actual feature data
            'backbone_features_shape': json.dumps(list(backbone_feat.shape)) if backbone_features is not None else None,
            'backbone_features': backbone_features.tolist() if backbone_features is not None else None,
            'action_right_arm_first': action_right_arm_first.tolist() if action_right_arm_first is not None else None,
            
            # Batch metadata
            'batch_id': batch_id,
            'extraction_time': datetime.now().isoformat(),
        }
        
        rows.append(row)
    
    # Create DataFrame
    df = pd.DataFrame(rows)
    
    # Save as parquet with compression
    batch_file = get_batch_filename(batch_id)
    df.to_parquet(batch_file, compression='snappy', index=False)
    
    # Save batch metadata separately
    batch_metadata = {
        'batch_id': batch_id,
        'batch_size': len(batch_data),
        'extraction_time': datetime.now().isoformat(),
        'model_path': MODEL_PATH,
        'embodiment_tag': EMBODIMENT_TAG,
        'file_format': 'parquet',
        'compression': 'snappy'
    }
    
    metadata_file = batch_file.replace('.parquet', '_metadata.json')
    with open(metadata_file, 'w') as f:
        json.dump(batch_metadata, f, indent=2)
    
    return batch_file, len(batch_data)

print("✅ Batch processing system initialized!")


In [None]:
# 🔄 MAIN BATCH EXTRACTION LOOP
# This processes data in batches and can resume from interruptions

def extract_batches(target_samples=TARGET_TOTAL_SAMPLES, batch_size=BATCH_SIZE):
    """
    Main function to extract data in batches with resume capability
    """
    
    # Load existing progress
    progress = load_progress()
    print(f"📊 Current progress: {progress['total_extracted']:,} samples extracted")
    
    if progress['total_extracted'] >= target_samples:
        print(f"✅ Target already reached! {progress['total_extracted']:,} >= {target_samples:,}")
        return
    
    # Load dataset
    task_name = available_tasks[0]  # Using first available task
    task_path = os.path.join(DATASET_ROOT, task_name)
    
    print(f"🔄 Loading dataset: {task_name}")
    dataset = LeRobotSingleDataset(
        dataset_path=task_path,
        modality_configs=modality_config,
        video_backend="decord",
        video_backend_kwargs=None,
        transforms=None,
        embodiment_tag=EMBODIMENT_TAG,
    )
    
    print(f"📊 Dataset size: {len(dataset):,} samples")
    
    # Calculate remaining work
    remaining_samples = target_samples - progress['total_extracted']
    start_index = progress['total_extracted']
    
    print(f"🎯 Need to extract {remaining_samples:,} more samples")
    print(f"▶️  Starting from index: {start_index:,}")
    
    # Process in batches
    current_batch_data = []
    batch_id = progress['last_batch_id'] + 1
    samples_processed = 0
    
    try:
        for i in tqdm(range(start_index, min(start_index + remaining_samples, len(dataset))), 
                      desc="Extracting batches"):
            try:
                # Get sample data
                step_data = dataset[i]
                
                # Create dataset info
                dataset_info = {
                    'task_name': task_name,
                    'sample_index': i,
                    'total_samples': len(dataset),
                    'global_index': progress['total_extracted'] + samples_processed
                }
                
                # Extract data
                data_dict = extract_single_step_data(policy, step_data, dataset_info)
                current_batch_data.append(data_dict)
                samples_processed += 1
                
                # Save batch when it reaches batch_size
                if len(current_batch_data) >= batch_size:
                    batch_file, batch_size_actual = save_batch_data(current_batch_data, batch_id)
                    
                    # Update progress
                    progress['completed_batches'].append(batch_id)
                    progress['total_extracted'] += batch_size_actual
                    progress['last_batch_id'] = batch_id
                    save_progress(progress)
                    
                    print(f"✅ Saved batch {batch_id:04d}: {batch_size_actual:,} samples → {progress['total_extracted']:,} total")
                    
                    # Clear batch data and increment batch_id
                    current_batch_data = []
                    batch_id += 1
                    
                    # Check if we've reached our target
                    if progress['total_extracted'] >= target_samples:
                        print(f"🎉 Target reached! {progress['total_extracted']:,} samples extracted")
                        break
                        
            except Exception as e:
                print(f"⚠️  Failed to process sample {i}: {e}")
                continue
        
        # Save any remaining data in the last batch
        if current_batch_data:
            batch_file, batch_size_actual = save_batch_data(current_batch_data, batch_id)
            progress['completed_batches'].append(batch_id)
            progress['total_extracted'] += batch_size_actual
            progress['last_batch_id'] = batch_id
            save_progress(progress)
            print(f"✅ Saved final batch {batch_id:04d}: {batch_size_actual:,} samples → {progress['total_extracted']:,} total")
            
    except KeyboardInterrupt:
        print(f"\n⏸️  Extraction interrupted by user")
        # Save any partial batch
        if current_batch_data:
            batch_file, batch_size_actual = save_batch_data(current_batch_data, batch_id)
            progress['completed_batches'].append(batch_id)
            progress['total_extracted'] += batch_size_actual
            progress['last_batch_id'] = batch_id
            save_progress(progress)
            print(f"💾 Saved partial batch {batch_id:04d}: {batch_size_actual:,} samples")
    
    except Exception as e:
        print(f"❌ Error during extraction: {e}")
        # Save any partial batch
        if current_batch_data:
            try:
                batch_file, batch_size_actual = save_batch_data(current_batch_data, batch_id)
                progress['completed_batches'].append(batch_id)
                progress['total_extracted'] += batch_size_actual
                progress['last_batch_id'] = batch_id
                save_progress(progress)
                print(f"💾 Saved partial batch {batch_id:04d}: {batch_size_actual:,} samples")
            except:
                print("❌ Failed to save partial batch")
    
    # Final summary
    final_progress = load_progress()
    print(f"\n📊 Extraction summary:")
    print(f"   • Total extracted: {final_progress['total_extracted']:,} samples")
    print(f"   • Batches completed: {len(final_progress['completed_batches'])}")
    print(f"   • Progress: {final_progress['total_extracted']/target_samples*100:.1f}%")
    
    return final_progress

print("✅ Batch extraction function ready!")


In [None]:
# 🔗 MERGE BATCH FILES INTO FINAL OUTPUT
# This combines all batch files into a single final training file

def merge_all_batches(output_filename="probe_training_data_150k.parquet"):
    """
    Merge all parquet batch files into a single final training file
    """
    
    # Find all batch files
    batch_files = glob.glob(os.path.join(BATCH_OUTPUT_DIR, "batch_*.parquet"))
    batch_files.sort()  # Sort to process in order
    
    if not batch_files:
        print("❌ No batch files found to merge!")
        return None
    
    print(f"🔗 Found {len(batch_files)} parquet batch files to merge")
    
    # Read and combine all batch files
    all_dataframes = []
    total_samples = 0
    
    # Process each batch file
    for i, batch_file in enumerate(tqdm(batch_files, desc="Merging parquet batches")):
        try:
            # Read parquet file
            df = pd.read_parquet(batch_file)
            all_dataframes.append(df)
            total_samples += len(df)
            
            if i == 0:
                # Get metadata from first batch
                metadata_file = batch_file.replace('.parquet', '_metadata.json')
                if os.path.exists(metadata_file):
                    with open(metadata_file, 'r') as f:
                        first_batch_metadata = json.load(f)
                else:
                    first_batch_metadata = {'model_path': MODEL_PATH, 'embodiment_tag': EMBODIMENT_TAG}
            
        except Exception as e:
            print(f"⚠️  Error processing {batch_file}: {e}")
            continue
    
    # Combine all DataFrames
    print("🔄 Combining all batch DataFrames...")
    final_df = pd.concat(all_dataframes, ignore_index=True)
    
    # Add extraction metadata as additional columns
    final_df['final_total_samples'] = total_samples
    final_df['final_batch_count'] = len(batch_files)
    final_df['final_merge_time'] = datetime.now().isoformat()
    final_df['final_model_path'] = first_batch_metadata.get('model_path', MODEL_PATH)
    final_df['final_embodiment_tag'] = first_batch_metadata.get('embodiment_tag', EMBODIMENT_TAG)
    
    # Save final merged file
    final_output_file = os.path.join(OUTPUT_DIR, output_filename)
    
    print(f"💾 Saving merged parquet file to: {final_output_file}")
    final_df.to_parquet(final_output_file, compression='snappy', index=False)
    
    # Save summary metadata
    summary_metadata = {
        'total_samples': total_samples,
        'model_path': first_batch_metadata.get('model_path', MODEL_PATH),
        'embodiment_tag': first_batch_metadata.get('embodiment_tag', EMBODIMENT_TAG),
        'batch_count': len(batch_files),
        'merge_time': datetime.now().isoformat(),
        'source_batches': [os.path.basename(f) for f in batch_files],
        'file_format': 'parquet',
        'compression': 'snappy',
        'columns': list(final_df.columns),
        'file_size_mb': os.path.getsize(final_output_file) / (1024*1024)
    }
    
    metadata_output_file = final_output_file.replace('.parquet', '_metadata.json')
    with open(metadata_output_file, 'w') as f:
        json.dump(summary_metadata, f, indent=2)
    
    print(f"✅ Merge completed!")
    print(f"   • Total samples: {total_samples:,}")
    print(f"   • Batches merged: {len(batch_files)}")
    print(f"   • Final file: {final_output_file}")
    print(f"   • Metadata file: {metadata_output_file}")
    print(f"   • File size: {summary_metadata['file_size_mb']:.1f} MB")
    print(f"   • Columns: {len(final_df.columns)}")
    
    return final_output_file, total_samples

def check_batch_status():
    """Check current status of batch extraction"""
    progress = load_progress()
    batch_files = glob.glob(os.path.join(BATCH_OUTPUT_DIR, "batch_*.parquet"))
    
    print(f"📊 Batch Extraction Status (Parquet):")
    print(f"   • Progress file: {PROGRESS_FILE}")
    print(f"   • Total extracted: {progress['total_extracted']:,} samples")
    print(f"   • Completed batches: {len(progress['completed_batches'])}")
    print(f"   • Last batch ID: {progress['last_batch_id']}")
    print(f"   • Parquet files on disk: {len(batch_files)}")
    print(f"   • Target progress: {progress['total_extracted']/TARGET_TOTAL_SAMPLES*100:.1f}%")
    
    if batch_files:
        # Calculate total size of batch files
        total_size_mb = sum(os.path.getsize(f) for f in batch_files) / (1024*1024)
        print(f"   • Total batch size: {total_size_mb:.1f} MB")
        print(f"   • Batch files: {sorted([os.path.basename(f) for f in batch_files[:10]])}")  # Show first 10
        if len(batch_files) > 10:
            print(f"     ... and {len(batch_files)-10} more")
    
    return progress

print("✅ Merge functions ready!")


In [None]:
# 🚀 START BATCH EXTRACTION
# Run this cell to start or resume batch extraction

print("🚀 Starting batch extraction for 150K samples...")
print("⚠️  This will run until interrupted or completed")
print("📝 Progress is automatically saved - you can resume after Colab disconnects")
print("\n" + "="*60)

# Check current status first
check_batch_status()

print("\n" + "="*60)
print("▶️  Starting extraction...")

# Start extraction (this will resume from where it left off)
final_progress = extract_batches()

print("\n🎉 Batch extraction completed!")
print("🔗 You can now merge the batches or continue later")


In [None]:
# 📊 CHECK EXTRACTION STATUS
# Run this cell anytime to check progress

check_batch_status()


In [None]:
# 🔗 MERGE ALL PARQUET BATCHES INTO FINAL FILE
# Run this cell when you want to combine all parquet batch files

print("🔗 Merging all parquet batch files into final training data...")

# Check status before merging
progress = check_batch_status()

if progress['total_extracted'] < 1000:  # Minimum threshold
    print(f"⚠️  Only {progress['total_extracted']} samples found. Consider extracting more first.")
else:
    print(f"✅ Proceeding with merge of {progress['total_extracted']:,} samples")
    
    # Merge all batches
    final_file, total_samples = merge_all_batches()
    
    if final_file:
        print(f"\n🎉 Final merged parquet file ready!")
        print(f"📁 Location: {final_file}")
        print(f"📊 Total samples: {total_samples:,}")
        
        # Display file size
        file_size_mb = os.path.getsize(final_file) / (1024*1024)
        print(f"💾 File size: {file_size_mb:.1f} MB")
        print(f"🗜️  Estimated space savings vs pickle: ~60-80% smaller")
    else:
        print("❌ Merge failed!")


In [None]:
# 🧪 TEST FINAL MERGED PARQUET DATA
# Run this cell to verify the final merged parquet data structure

final_file_path = os.path.join(OUTPUT_DIR, "probe_training_data_150k.parquet")
metadata_file_path = final_file_path.replace('.parquet', '_metadata.json')

if os.path.exists(final_file_path):
    print(f"🔍 Testing merged parquet file: {final_file_path}")
    
    # Load and inspect the parquet data
    df = pd.read_parquet(final_file_path)
    
    print(f"\n📊 Data Structure Analysis:")
    print(f"   • DataFrame shape: {df.shape}")
    print(f"   • Total samples: {len(df):,}")
    print(f"   • Available columns: {list(df.columns)}")
    
    # Show data types
    print(f"\n📋 Column Types:")
    for col in df.columns:
        print(f"   • {col}: {df[col].dtype}")
    
    # Check first few samples
    print(f"\n🔍 Sample Data:")
    for i in range(min(3, len(df))):
        print(f"\n   Sample {i}:")
        
        # Basic info
        print(f"     • Task: {df.iloc[i]['task_name']}")
        print(f"     • Sample index: {df.iloc[i]['sample_index']}")
        print(f"     • Global index: {df.iloc[i]['global_index']}")
        
        # Check backbone features
        backbone_feat = df.iloc[i]['backbone_features']
        backbone_shape = df.iloc[i]['backbone_features_shape']
        if backbone_feat is not None:
            print(f"     • backbone_features type: {type(backbone_feat)}")
            print(f"     • backbone_features length: {len(backbone_feat)}")
            if backbone_shape:
                original_shape = json.loads(backbone_shape)
                print(f"     • original backbone shape: {original_shape}")
        
        # Check action values
        action_val = df.iloc[i]['action_right_arm_first']
        if action_val is not None:
            print(f"     • action.right_arm_first type: {type(action_val)}")
            print(f"     • action.right_arm_first length: {len(action_val)}")
            print(f"     • action.right_arm_first value: {action_val[:3]}...")  # Show first 3 values
    
    # Load and display metadata if available
    if os.path.exists(metadata_file_path):
        print(f"\n📋 Extraction Metadata:")
        with open(metadata_file_path, 'r') as f:
            metadata = json.load(f)
        for key, value in metadata.items():
            if key not in ['source_batches', 'columns']:  # Skip long lists
                print(f"   • {key}: {value}")
    
    # File size info
    file_size_mb = os.path.getsize(final_file_path) / (1024*1024)
    print(f"\n💾 File Information:")
    print(f"   • File size: {file_size_mb:.1f} MB")
    print(f"   • Average size per sample: {file_size_mb*1024/len(df):.1f} KB")
    
    print(f"\n✅ Parquet data structure looks good!")
    print(f"🎯 Ready for probe training with {len(df):,} samples")
    
    # Helper function to reconstruct backbone features
    def reconstruct_backbone_features(row):
        """Helper to reconstruct original backbone features from flattened data"""
        features = row['backbone_features']
        shape_str = row['backbone_features_shape']
        if features is not None and shape_str is not None:
            original_shape = json.loads(shape_str)
            return np.array(features).reshape(original_shape)
        return None
    
    print(f"\n💡 To reconstruct backbone features for training:")
    print(f"   features = np.array(row['backbone_features']).reshape(json.loads(row['backbone_features_shape']))")
    
else:
    print("❌ Final merged parquet file not found!")
    print("🔧 Run the merge cell first to create the final file")


In [None]:
# 🛠️ UTILITY FUNCTIONS FOR PARQUET DATA
# Helper functions to load and work with the parquet training data

def load_training_data(parquet_file_path, max_samples=None):
    """
    Load training data from parquet file and prepare for ML training
    
    Args:
        parquet_file_path: Path to the merged parquet file
        max_samples: Optional limit on number of samples to load
        
    Returns:
        backbone_features: List of reconstructed backbone feature arrays
        action_targets: List of action.right_arm first values
        metadata: Dictionary with data info
    """
    
    print(f"📁 Loading training data from: {parquet_file_path}")
    
    # Load parquet file
    df = pd.read_parquet(parquet_file_path)
    
    if max_samples:
        df = df.head(max_samples)
        print(f"🔢 Limited to first {max_samples:,} samples")
    
    print(f"📊 Loaded {len(df):,} samples")
    
    # Reconstruct backbone features
    print("🔄 Reconstructing backbone features...")
    backbone_features = []
    action_targets = []
    
    for i, row in tqdm(df.iterrows(), total=len(df), desc="Processing samples"):
        # Reconstruct backbone features from flattened data
        if row['backbone_features'] is not None and row['backbone_features_shape'] is not None:
            features = np.array(row['backbone_features'])
            original_shape = json.loads(row['backbone_features_shape'])
            backbone_feat = features.reshape(original_shape)
            backbone_features.append(backbone_feat)
        else:
            backbone_features.append(None)
        
        # Get action targets
        if row['action_right_arm_first'] is not None:
            action_targets.append(np.array(row['action_right_arm_first']))
        else:
            action_targets.append(None)
    
    # Create metadata
    metadata = {
        'total_samples': len(df),
        'feature_shape': json.loads(df.iloc[0]['backbone_features_shape']) if df.iloc[0]['backbone_features_shape'] else None,
        'action_dim': len(df.iloc[0]['action_right_arm_first']) if df.iloc[0]['action_right_arm_first'] else None,
        'task_names': df['task_name'].unique().tolist(),
        'file_path': parquet_file_path
    }
    
    print(f"✅ Training data prepared:")
    print(f"   • Samples: {len(backbone_features):,}")
    print(f"   • Feature shape: {metadata['feature_shape']}")
    print(f"   • Action dim: {metadata['action_dim']}")
    print(f"   • Tasks: {metadata['task_names']}")
    
    return backbone_features, action_targets, metadata

def create_train_test_split(backbone_features, action_targets, test_ratio=0.1, random_seed=42):
    """
    Create train/test split for probe training
    
    Args:
        backbone_features: List of feature arrays
        action_targets: List of action arrays  
        test_ratio: Fraction for test set
        random_seed: Random seed for reproducibility
        
    Returns:
        train_features, test_features, train_targets, test_targets
    """
    
    np.random.seed(random_seed)
    
    # Filter out None values
    valid_indices = [i for i in range(len(backbone_features)) 
                    if backbone_features[i] is not None and action_targets[i] is not None]
    
    valid_features = [backbone_features[i] for i in valid_indices]
    valid_targets = [action_targets[i] for i in valid_indices]
    
    # Create train/test split
    n_samples = len(valid_features)
    n_test = int(n_samples * test_ratio)
    
    indices = np.random.permutation(n_samples)
    test_indices = indices[:n_test]
    train_indices = indices[n_test:]
    
    train_features = [valid_features[i] for i in train_indices]
    test_features = [valid_features[i] for i in test_indices]
    train_targets = [valid_targets[i] for i in train_indices]
    test_targets = [valid_targets[i] for i in test_indices]
    
    print(f"📊 Train/Test Split:")
    print(f"   • Training samples: {len(train_features):,}")
    print(f"   • Test samples: {len(test_features):,}")
    print(f"   • Test ratio: {test_ratio:.1%}")
    
    return train_features, test_features, train_targets, test_targets

print("✅ Utility functions ready for parquet data loading!")


In [None]:
# Main extraction loop - Extract all data to single file
MAX_SAMPLES_PER_TASK = 150000  # Limit samples per task for testing (set to None for all)

all_extracted_data = []
total_extracted = 0

for task_name in available_tasks:
    print(f"\n🔄 Processing task: {task_name}")

    # Setup task-specific paths
    task_path = os.path.join(DATASET_ROOT, task_name)

    try:
        dataset = LeRobotSingleDataset(
            dataset_path=task_path,
            modality_configs=modality_config,
            video_backend="decord",
            video_backend_kwargs=None,
            transforms=None,
            embodiment_tag=EMBODIMENT_TAG,
        )

        print(f"📊 Dataset loaded: {len(dataset)} samples")

        # Limit samples for testing
        num_samples = len(dataset)
        if MAX_SAMPLES_PER_TASK and num_samples > MAX_SAMPLES_PER_TASK:
            num_samples = MAX_SAMPLES_PER_TASK
            print(f"⚠️  Limited to {num_samples} samples for testing")

        # Process each sample
        task_total = 0

        for i in tqdm(range(num_samples), desc=f"Extracting {task_name}"):
            try:
                # Get sample data
                step_data = dataset[i]

                # Create dataset info
                dataset_info = {
                    'task_name': task_name,
                    'sample_index': i,
                    'total_samples': len(dataset)
                }

                # Extract data in user's format
                data_dict = extract_single_step_data(policy, step_data, dataset_info)
                all_extracted_data.append(data_dict)

                task_total += 1

            except Exception as e:
                print(f"⚠️  Failed to process sample {i}: {e}")
                continue

        total_extracted += task_total
        print(f"✅ Completed {task_name}: {task_total} samples extracted")

    except Exception as e:
        print(f"❌ Failed to process task {task_name}: {e}")
        continue

# Save all data to single file
if all_extracted_data:
    output_file = os.path.join(OUTPUT_DIR, "probe_training_data.pkl")
    total_saved = save_all_extraction_data(all_extracted_data, output_file)

    print(f"\n🎉 Extraction completed!")
    print(f"📊 Total samples extracted: {total_saved}")
    print(f"💾 All data saved to: {output_file}")
else:
    print("❌ No data extracted!")



🔄 Processing task: gr1_arms_waist.TrayToPot
Initialized dataset gr1_arms_waist.TrayToPot with gr1
📊 Dataset loaded: 1950435 samples
⚠️  Limited to 50 samples for testing


Extracting gr1_arms_waist.TrayToPot:   0%|          | 0/50 [00:00<?, ?it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:   4%|▍         | 2/50 [00:00<00:09,  4.83it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:   6%|▌         | 3/50 [00:00<00:09,  4.82it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:   8%|▊         | 4/50 [00:00<00:09,  4.82it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  10%|█         | 5/50 [00:01<00:09,  4.86it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  12%|█▏        | 6/50 [00:01<00:09,  4.88it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  14%|█▍        | 7/50 [00:01<00:08,  4.87it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  16%|█▌        | 8/50 [00:01<00:08,  4.87it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  18%|█▊        | 9/50 [00:01<00:08,  4.86it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  20%|██        | 10/50 [00:02<00:08,  4.88it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  22%|██▏       | 11/50 [00:02<00:08,  4.83it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  24%|██▍       | 12/50 [00:02<00:07,  4.83it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  26%|██▌       | 13/50 [00:02<00:07,  4.81it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  28%|██▊       | 14/50 [00:02<00:07,  4.82it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  30%|███       | 15/50 [00:03<00:07,  4.86it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  32%|███▏      | 16/50 [00:03<00:06,  4.86it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  34%|███▍      | 17/50 [00:03<00:06,  4.85it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  36%|███▌      | 18/50 [00:03<00:06,  4.83it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  38%|███▊      | 19/50 [00:03<00:06,  4.83it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  40%|████      | 20/50 [00:04<00:06,  4.82it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  42%|████▏     | 21/50 [00:04<00:06,  4.83it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  44%|████▍     | 22/50 [00:04<00:05,  4.76it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  46%|████▌     | 23/50 [00:04<00:05,  4.76it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  48%|████▊     | 24/50 [00:04<00:05,  4.77it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  50%|█████     | 25/50 [00:05<00:05,  4.78it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  52%|█████▏    | 26/50 [00:05<00:05,  4.78it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  54%|█████▍    | 27/50 [00:05<00:04,  4.77it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  56%|█████▌    | 28/50 [00:05<00:04,  4.78it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  58%|█████▊    | 29/50 [00:06<00:04,  4.75it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  60%|██████    | 30/50 [00:06<00:04,  4.74it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  62%|██████▏   | 31/50 [00:06<00:04,  4.74it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  64%|██████▍   | 32/50 [00:06<00:03,  4.63it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  66%|██████▌   | 33/50 [00:06<00:03,  4.66it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  68%|██████▊   | 34/50 [00:07<00:03,  4.67it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  70%|███████   | 35/50 [00:07<00:03,  4.63it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  72%|███████▏  | 36/50 [00:07<00:03,  4.63it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  74%|███████▍  | 37/50 [00:07<00:02,  4.62it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  76%|███████▌  | 38/50 [00:07<00:02,  4.63it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  78%|███████▊  | 39/50 [00:08<00:02,  4.62it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  80%|████████  | 40/50 [00:08<00:02,  4.65it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  82%|████████▏ | 41/50 [00:08<00:01,  4.67it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  84%|████████▍ | 42/50 [00:08<00:01,  4.66it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  86%|████████▌ | 43/50 [00:09<00:01,  4.70it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  88%|████████▊ | 44/50 [00:09<00:01,  4.68it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  90%|█████████ | 45/50 [00:09<00:01,  4.68it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  92%|█████████▏| 46/50 [00:09<00:00,  4.68it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  94%|█████████▍| 47/50 [00:09<00:00,  4.71it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  96%|█████████▌| 48/50 [00:10<00:00,  4.70it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot:  98%|█████████▊| 49/50 [00:10<00:00,  4.67it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])


Extracting gr1_arms_waist.TrayToPot: 100%|██████████| 50/50 [00:10<00:00,  4.75it/s]

🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
🔍 VLM Input shapes:
  input_ids: torch.Size([1, 300])
  attention_mask: torch.Size([1, 300])
  pixel_values: torch.Size([1, 3, 224, 224])
taking eagle output from layer 12
🔍 VLM Raw Output shape (layer 12): torch.Size([1, 300, 2048])
🔍 VLM Final Output shape (after linear): torch.Size([1, 300, 2048])
✅ Completed gr1_arms_waist.TrayToPot: 50 samples extracted
💾 Saved all data to /content/drive/MyDrive/probe_training_data/probe_training_data.pkl
   - Total samples: 50
   - Data keys: ['dataset', 'step_data', 'vlm_output', 'final_output', 'extraction_info']

🎉 Extraction completed!
📊 Total samples extracted: 50
💾 All data saved to: /content/drive/MyDrive/probe_training_data/probe_training_dat




##Try to read the data

In [None]:
with open(output_file, 'rb') as f:
        sample_data = pickle.load(f)
        print(sample_data.keys())



dict_keys(['dataset', 'step_data', 'vlm_output', 'final_output', 'extraction_info'])
[{'action.left_arm': array([[-0.18504381,  0.53471065, -0.21452594, -1.5583756 ,  0.33902884,
         0.23274505,  0.1405673 ],
       [-0.17396569,  0.53386754, -0.20762849, -1.56868684,  0.34674406,
         0.24635494,  0.14377487],
       [-0.18462443,  0.51617193, -0.20209455, -1.57772005,  0.370363  ,
         0.24490809,  0.13682878],
       [-0.17480469,  0.50190681, -0.21100736, -1.58488977,  0.36717892,
         0.2543292 ,  0.14009082],
       [-0.18928099,  0.47840521, -0.19889927, -1.59417081,  0.37155914,
         0.26944125,  0.13190877],
       [-0.18927646,  0.47085866, -0.20024228, -1.58803713,  0.37624788,
         0.27162647,  0.1302073 ],
       [-0.16229463,  0.47033566, -0.21272993, -1.57238758,  0.38687825,
         0.27784669,  0.14654624],
       [-0.15435839,  0.45532805, -0.22741294, -1.57785928,  0.40606904,
         0.28256822,  0.1505723 ],
       [-0.15307736,  0.434756