# OpenVLA DPO Training Demo - Professional Version

This notebook provides a modular implementation of Direct Preference Optimization (DPO) training for OpenVLA models. Each section can be run independently for debugging and testing purposes.

## Overview
1. **Environment Setup** - Import libraries and configure paths
2. **Configuration** - Set training parameters and model configs
3. **Model Loading** - Load policy and reference models
4. **Data Loading** - Setup datasets and data loaders
5. **DPO Training** - Main training loop
6. **Testing & Debugging** - Utilities for debugging each component


## 1. Environment Setup and Imports

Import all necessary libraries and setup the Python path for accessing local modules.


# TO DO
## 1. 修改计算logprob时的mask, 不算separate action token的loss.
## 2. 对一个stream中每个action units赋予不同的weights, according to spatial distance.
## 3. 同时具备离线和在线的loser 轨迹采集
## 4. 

In [None]:
%load_ext autoreload
%autoreload 2           

In [2]:
#!/usr/bin/env python3
"""
DPO Training Demo - Environment Setup
"""

import os
import sys
import argparse
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add the parent directories to Python path for imports
current_dir = os.getcwd()
parent_dir = os.path.join(current_dir, "..", "..")
sys.path.append(parent_dir)
print(f"Added to Python path: {parent_dir}")

# Core imports
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from trl.trainer.dpo_trainer import DataCollatorForPreference
import numpy as np
from tqdm import tqdm
from experiments.robot.libero.libero_utils import (
    get_libero_dummy_action,
    get_libero_env,
    get_libero_image,
    quat2axisangle,
    save_rollout_video_CoA,
)

# Local imports
try:
    from src.config import GenerateConfig
    from src.model_utils import setup_vla_model_with_lora, setup_model_and_config, setup_logging_and_environment
    from src.training_utils import train_dpo, compute_log_probs, dpo_loss
    from src.data_process import TrajectoryDataset
    print("✓ Successfully imported local modules")
except ImportError as e:
    print(f"✗ Failed to import local modules: {e}")
    print("Please ensure you're running from the correct directory")

# External imports  
try:
    from experiments.robot.robot_utils import get_model
    print("✓ Successfully imported external modules")
except ImportError as e:
    print(f"✗ Failed to import external modules: {e}")

# Check GPU availability
print(f"\nGPU Information:")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"  GPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"    Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9:.1f} GB")

print("\n" + "="*50)
print("Environment setup completed!")
print("="*50)


Added to Python path: /mnt/sda/home/zijianwang/openvla/vla-scripts/DPO/../..


2025-08-21 20:29:49.417604: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-08-21 20:29:49.417798: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-08-21 20:29:49.606938: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-08-21 20:29:50.129667: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


✓ Successfully imported local modules
✓ Successfully imported external modules

GPU Information:
CUDA available: True
GPU count: 4
  GPU 0: NVIDIA GeForce RTX 4090
    Memory: 25.4 GB
  GPU 1: NVIDIA GeForce RTX 4090
    Memory: 25.4 GB
  GPU 2: NVIDIA GeForce RTX 4090
    Memory: 25.4 GB
  GPU 3: NVIDIA RTX A6000
    Memory: 51.0 GB

Environment setup completed!


## 2. Configuration Setup

Configure all training parameters. You can modify these parameters easily for different experiments.


In [4]:
"""
Configuration Setup - Modify parameters here for different experiments
"""

# ====== TRAINING PARAMETERS ======
DEVICE_POLICY = "cuda:0"  # Device for policy model
DEVICE_REF = "cuda:1"     # Device for reference model
MAX_STEPS = 100           # Maximum training steps (reduced for demo)
BATCH_SIZE = 1            # Training batch size
LEARNING_RATE = 0.0005    # Learning rate
DPO_BETA = 0.1           # DPO beta parameter
STREAM_LENGTH = 10        # Stream length for trajectory processing

# ====== WANDB CONFIGURATION ======
USE_WANDB = False         # Set to True to enable Weights & Biases logging
WANDB_PROJECT = "openvla_CoA_DPO_demo"
WANDB_ENTITY = "15652388600"
RUN_ID_NOTE = "notebook_demo"

# ====== PATH CONFIGURATION ======
ROOT_DIR = "/mnt/sda/home/zijianwang"


# ROOT_DIR="/mnt/sda/home/zijianwang"
# PRETRAINED_CHECKPOINT="${ROOT_DIR}/openvla/FT_res/openvla-7b-finetuned-libero-10+libero_10_no_noops+b4+lr-0.0005+lora-r48+dropout-0.0--image_aug--2025-07-18_19-26-25"
# LORA_PATH="${ROOT_DIR}/openvla/adapter_tmp_dir/openvla-7b-finetuned-libero-10+libero_10_no_noops+b4+lr-0.0005+lora-r48+dropout-0.0--image_aug--2025-07-18_19-26-25"
# BASE_VLA_PATH="${ROOT_DIR}/HF_CACHE/openvla-7b-finetuned-libero-10"
# WINNER_TRAJECTORY_PATH="${ROOT_DIR}/openvla/vla-scripts/DPO/winner_trajectory"
# ADAPTER_TMP_DIR="${ROOT_DIR}/openvla/DPO_adapter_tmp_dir"
# RUN_ROOT_DIR="${ROOT_DIR}/openvla/DPO_res"

# Optional: Override default paths (leave empty to use defaults)
PRETRAINED_CHECKPOINT = f"{ROOT_DIR}/openvla/FT_res/openvla-7b-finetuned-libero-10+libero_10_no_noops+b4+lr-0.0005+lora-r48+dropout-0.0--image_aug--2025-07-18_19-26-25"
LORA_PATH = f"{ROOT_DIR}/openvla/adapter_tmp_dir/openvla-7b-finetuned-libero-10+libero_10_no_noops+b4+lr-0.0005+lora-r48+dropout-0.0--image_aug--2025-07-18_19-26-25"
BASE_VLA_PATH = f"{ROOT_DIR}/HF_CACHE/openvla-7b-finetuned-libero-10"
WINNER_TRAJECTORY_PATH = f"{ROOT_DIR}/openvla/vla-scripts/DPO/winner_trajectory"
ADAPTER_TMP_DIR = f"{ROOT_DIR}/openvla/DPO_adapter_tmp_dir"
RUN_ROOT_DIR = f"{ROOT_DIR}/openvla/DPO_res"
TASK_NUM = 1            # Set to specific task number or None for all tasks
# Create configuration objects
print("Creating configuration...")

# Policy model configuration
model_cfg = GenerateConfig(
    root_dir=ROOT_DIR,
    device=DEVICE_POLICY,
    max_steps=MAX_STEPS,
    batch_size=BATCH_SIZE,
    learning_rate=LEARNING_RATE,
    dpo_beta=DPO_BETA,
    stream_length=STREAM_LENGTH,
    use_wandb=USE_WANDB,
    wandb_project=WANDB_PROJECT,
    wandb_entity=WANDB_ENTITY,
    run_id_note=RUN_ID_NOTE,
    grad_accumulation_steps=1,
    pretrained_checkpoint=PRETRAINED_CHECKPOINT,
    lora_path=LORA_PATH,
    base_vla_path=BASE_VLA_PATH,
    winner_trajectory_path=WINNER_TRAJECTORY_PATH,
    adapter_tmp_dir=ADAPTER_TMP_DIR,
    run_root_dir=RUN_ROOT_DIR,
    task_num=TASK_NUM
)

# Reference model configuration
ref_config = GenerateConfig(
    root_dir=ROOT_DIR,
    device=DEVICE_REF,
    pretrained_checkpoint=PRETRAINED_CHECKPOINT,
    lora_path=LORA_PATH,
    base_vla_path=BASE_VLA_PATH,
    winner_trajectory_path=WINNER_TRAJECTORY_PATH,
    adapter_tmp_dir=ADAPTER_TMP_DIR,
    run_root_dir=RUN_ROOT_DIR
)

print("\n" + "="*50)
print("CONFIGURATION SUMMARY")
print("="*50)
print(f"Policy Device: {model_cfg.device}")
print(f"Reference Device: {ref_config.device}")
print(f"Max Steps: {model_cfg.max_steps}")
print(f"Batch Size: {model_cfg.batch_size}")
print(f"Learning Rate: {model_cfg.learning_rate}")
print(f"DPO Beta: {model_cfg.dpo_beta}")
print(f"Stream Length: {model_cfg.stream_length}")
print(f"Use WandB: {model_cfg.use_wandb}")
print(f"Task Number: {model_cfg.task_num if model_cfg.task_num else 'All tasks'}")
print("\nPath Configuration:")
print(f"Root Dir: {model_cfg.root_dir}")
print(f"Pretrained Checkpoint: {model_cfg.pretrained_checkpoint}")
print(f"LoRA Path: {model_cfg.lora_path}")
print(f"Winner Trajectory Path: {model_cfg.winner_trajectory_path}")
print(f"Adapter Tmp Dir: {model_cfg.adapter_tmp_dir}")
print(f"Run Root Dir: {model_cfg.run_root_dir}")
print("="*50)


Creating configuration...

CONFIGURATION SUMMARY
Policy Device: cuda:0
Reference Device: cuda:1
Max Steps: 100
Batch Size: 1
Learning Rate: 0.0005
DPO Beta: 0.1
Stream Length: 10
Use WandB: False
Task Number: 1

Path Configuration:
Root Dir: /mnt/sda/home/zijianwang
Pretrained Checkpoint: /mnt/sda/home/zijianwang/openvla/FT_res/openvla-7b-finetuned-libero-10+libero_10_no_noops+b4+lr-0.0005+lora-r48+dropout-0.0--image_aug--2025-07-18_19-26-25
LoRA Path: /mnt/sda/home/zijianwang/openvla/adapter_tmp_dir/openvla-7b-finetuned-libero-10+libero_10_no_noops+b4+lr-0.0005+lora-r48+dropout-0.0--image_aug--2025-07-18_19-26-25
Winner Trajectory Path: /mnt/sda/home/zijianwang/openvla/vla-scripts/DPO/winner_trajectory
Adapter Tmp Dir: /mnt/sda/home/zijianwang/openvla/DPO_adapter_tmp_dir
Run Root Dir: /mnt/sda/home/zijianwang/openvla/DPO_res


## 3. Model Loading

Load the policy model (with LoRA) and reference model. This section handles device placement and model initialization.


In [5]:
"""
Model Loading Section
"""

print("Starting model loading...")
print("This may take several minutes depending on model size and device speed.")
print("\n" + "-"*30)

# Load policy model with LoRA
print("[1/2] Loading policy model (with LoRA)...")
print(f"Target device: {model_cfg.device}")

try:
    policy_model = setup_vla_model_with_lora(model_cfg)
    print(f"✓ Policy model loaded successfully")
    print(f"Model device: {next(policy_model.parameters()).device}")
    print(f"Model dtype: {next(policy_model.parameters()).dtype}")
    
    # Count parameters
    total_params = sum(p.numel() for p in policy_model.parameters())
    trainable_params = sum(p.numel() for p in policy_model.parameters() if p.requires_grad)
    print(f"Total parameters: {total_params:,}")
    print(f"Trainable parameters: {trainable_params:,}")
    print(f"Trainable ratio: {100 * trainable_params / total_params:.2f}%")
    
except Exception as e:
    print(f"✗ Failed to load policy model: {e}")
    raise

print("\n" + "-"*30)

# Load reference model
print("[2/2] Loading reference model...")
print(f"Target device: {ref_config.device}")

try:
    ref_model = setup_vla_model_with_lora(ref_config)
    print(f"✓ Reference model loaded successfully")
    print(f"Model device: {next(ref_model.parameters()).device}")
    print(f"Model dtype: {next(ref_model.parameters()).dtype}")
    
    # Set reference model to eval mode and freeze parameters
    ref_model.eval()
    for param in ref_model.parameters():
        param.requires_grad = False
    print("✓ Reference model set to eval mode and frozen")
    
except Exception as e:
    print(f"✗ Failed to load reference model: {e}")
    raise

print("\n" + "="*50)
print("MODEL LOADING SUMMARY")
print("="*50)
print(f"Policy Model Device: {next(policy_model.parameters()).device}")
print(f"Reference Model Device: {next(ref_model.parameters()).device}")
print(f"Policy Model Trainable: {sum(p.requires_grad for p in policy_model.parameters())} params")
print(f"Reference Model Trainable: {sum(p.requires_grad for p in ref_model.parameters())} params")
print("Models loaded successfully!")
print("="*50)

# Optional: Clear cache to free up memory
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU cache cleared.")


Starting model loading...
This may take several minutes depending on model size and device speed.

------------------------------
[1/2] Loading policy model (with LoRA)...
Target device: cuda:0
[*] Instantiating Pretrained VLA model
[*] Loading in BF16 with Flash-Attention Enabled


Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00,  1.15it/s]


✓ Policy model loaded successfully
Model device: cuda:0
Model dtype: torch.bfloat16
Total parameters: 7,707,479,616
Trainable parameters: 166,242,432
Trainable ratio: 2.16%

------------------------------
[2/2] Loading reference model...
Target device: cuda:1
[*] Instantiating Pretrained VLA model
[*] Loading in BF16 with Flash-Attention Enabled


Loading checkpoint shards: 100%|██████████| 4/4 [00:00<00:00, 10.77it/s]


✓ Reference model loaded successfully
Model device: cuda:1
Model dtype: torch.bfloat16
✓ Reference model set to eval mode and frozen

MODEL LOADING SUMMARY
Policy Model Device: cuda:0
Reference Model Device: cuda:1
Policy Model Trainable: 878 params
Reference Model Trainable: 0 params
Models loaded successfully!
GPU cache cleared.


In [6]:
processor, log_file, task_suite, num_tasks_in_suite, resize_size = setup_logging_and_environment(model_cfg, policy_model)

Logging to local log file: ./experiments/logs/DPO-libero_10-openvla-2025_08_21-20_30_29--notebook_demo.txt
[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Task suite: libero_10


In [7]:
task = task_suite.get_task(model_cfg.task_num)
env, task_description = get_libero_env(task, model_cfg.model_family, resolution=256)



In [8]:
print(task_description)

put both the cream cheese box and the butter in the basket


In [9]:
def setup_data_loader(cfg, processor, model, env, task_suite, resize_size):
    """Setup the training data loader."""
    print("[*] Setting up dataset and data loader...")
    
    # Create dataset instance
    dataset = TrajectoryDataset(
        cfg, 
        cfg.winner_trajectory_path, 
        cfg.task_suite_name, 
        processor, 
        env, 
        task_suite,
        device=cfg.device, 
        model=model, 
        img_size=resize_size,
        stream_length=cfg.stream_length,
        task_num=cfg.task_num
    )
    
    # Create data collator
    data_collator = DataCollatorForPreference(pad_token_id=processor.tokenizer.pad_token_id)
    
    # Create data loader
    train_dataloader = DataLoader(
        dataset,
        batch_size=cfg.batch_size,
        shuffle=True,
        collate_fn=data_collator
    )
    
    print(f"Dataset created with {len(dataset)} trajectory pairs")
    return train_dataloader


train_dataloader = setup_data_loader(model_cfg, processor, policy_model, env, task_suite, resize_size)

[*] Setting up dataset and data loader...
Found 212 success trajectories
Task distribution: [('2', 21), ('6', 24), ('5', 36), ('1', 31), ('7', 24), ('0', 27), ('4', 22), ('9', 13), ('3', 9), ('8', 5)]
Dataset created with 31 trajectory pairs


In [10]:
print("[*] Verifying data loader setup...")
test_batch = next(iter(train_dataloader))
print(f"Batch keys: {test_batch.keys()}")
print(f"Chosen input shape: {test_batch['chosen_input_ids'].shape}")
print(f"Pixel values shape: {test_batch['pixel_values'].shape}")

[*] Verifying data loader setup...
[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Batch keys: dict_keys(['prompt_input_ids', 'prompt_attention_mask', 'chosen_input_ids', 'chosen_attention_mask', 'rejected_input_ids', 'rejected_attention_mask', 'pixel_values', 'distance'])
Chosen input shape: torch.Size([1, 80])
Pixel values shape: torch.Size([1, 6, 224, 224])


In [16]:
print(model_cfg.use_wandb)
try:
    final_adapter_dir = train_dpo(
        model=policy_model, 
        ref_model=ref_model, 
        train_dataloader=train_dataloader, 
        cfg=model_cfg, 
        if_not_demo=model_cfg.use_wandb
    )
    
    print(f"[*] Training completed successfully!")
    print(f"[*] Final adapter saved to: {final_adapter_dir}")
    
except KeyboardInterrupt:
    print("\n[*] Training interrupted by user")
    
except Exception as e:
    print(f"[*] Training failed with error: {e}")
    raise

False
Policy model device: cuda:0
Reference model device: cuda:1


  0%|          | 0/100 [00:00<?, ?it/s]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 0, total_loss: 0.4686, dpo_loss: 0.0000, sft_loss: 4.6859, distance: tensor([0.0003], dtype=torch.float64)


  1%|          | 1/100 [00:30<50:21, 30.52s/it]

Saved adapter to /mnt/sda/home/zijianwang/openvla/DPO_adapter_tmp_dir/openvla-7b+libero_10_no_noops+task1+b1+lr-0.0005+lora-r48+dropout-0.0--2025-08-21_21-07-02--notebook_demo/ckpt-0, batch_idx: 0
[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 1, total_loss: 0.4241, dpo_loss: 0.0000, sft_loss: 4.2409, distance: tensor([0.0874], dtype=torch.float64)


  2%|▏         | 2/100 [00:53<42:48, 26.21s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 2, total_loss: 0.5730, dpo_loss: 0.0000, sft_loss: 5.7303, distance: tensor([0.0887], dtype=torch.float64)


  3%|▎         | 3/100 [01:17<40:29, 25.05s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 3, total_loss: 0.4000, dpo_loss: 0.0000, sft_loss: 3.9997, distance: tensor([0.1125], dtype=torch.float64)


  4%|▍         | 4/100 [01:35<35:40, 22.30s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 4, total_loss: 10.1857, dpo_loss: 9.8125, sft_loss: 3.7316, distance: tensor([0.0323], dtype=torch.float64)


  5%|▌         | 5/100 [02:01<37:26, 23.65s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 5, total_loss: 0.3952, dpo_loss: 0.0000, sft_loss: 3.9516, distance: tensor([0.1022], dtype=torch.float64)


  6%|▌         | 6/100 [02:21<35:11, 22.47s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 6, total_loss: 0.2472, dpo_loss: 0.0000, sft_loss: 2.4720, distance: tensor([0.0262], dtype=torch.float64)


  7%|▋         | 7/100 [02:37<31:18, 20.20s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 7, total_loss: 5.0395, dpo_loss: 4.6875, sft_loss: 3.5203, distance: tensor([0.1216], dtype=torch.float64)


  8%|▊         | 8/100 [02:58<31:20, 20.44s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 8, total_loss: 0.3723, dpo_loss: 0.0000, sft_loss: 3.7234, distance: tensor([0.0973], dtype=torch.float64)


  9%|▉         | 9/100 [03:23<33:05, 21.82s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 9, total_loss: 0.3493, dpo_loss: 0.0000, sft_loss: 3.4927, distance: tensor([0.0527], dtype=torch.float64)


 10%|█         | 10/100 [03:47<33:55, 22.62s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 10, total_loss: 0.3157, dpo_loss: 0.0000, sft_loss: 3.1571, distance: tensor([0.0704], dtype=torch.float64)


 11%|█         | 11/100 [04:06<31:49, 21.45s/it]

Saved adapter to /mnt/sda/home/zijianwang/openvla/DPO_adapter_tmp_dir/openvla-7b+libero_10_no_noops+task1+b1+lr-0.0005+lora-r48+dropout-0.0--2025-08-21_21-07-02--notebook_demo/ckpt-10, batch_idx: 10
[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 11, total_loss: 0.3267, dpo_loss: 0.0000, sft_loss: 3.2669, distance: tensor([0.0426], dtype=torch.float64)


 12%|█▏        | 12/100 [04:31<33:15, 22.68s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 12, total_loss: 0.3372, dpo_loss: 0.0791, sft_loss: 2.5813, distance: tensor([0.1054], dtype=torch.float64)


 13%|█▎        | 13/100 [04:49<30:44, 21.20s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 13, total_loss: 0.2872, dpo_loss: 0.0000, sft_loss: 2.8716, distance: tensor([0.1116], dtype=torch.float64)


 14%|█▍        | 14/100 [05:11<30:49, 21.51s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 14, total_loss: 0.3194, dpo_loss: 0.0000, sft_loss: 3.1939, distance: tensor([0.0153], dtype=torch.float64)


 15%|█▌        | 15/100 [05:33<30:41, 21.67s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 15, total_loss: 0.3448, dpo_loss: 0.0000, sft_loss: 3.4482, distance: tensor([0.1151], dtype=torch.float64)


 16%|█▌        | 16/100 [05:52<29:07, 20.80s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 16, total_loss: 0.2774, dpo_loss: 0.0000, sft_loss: 2.7735, distance: tensor([0.0005], dtype=torch.float64)


 17%|█▋        | 17/100 [06:24<33:28, 24.19s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
batch_idx: 17, total_loss: 0.3051, dpo_loss: 0.0000, sft_loss: 3.0510, distance: tensor([0.0267], dtype=torch.float64)


 18%|█▊        | 18/100 [06:46<31:58, 23.39s/it]

[info] using task orders [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


                                                


[*] Training interrupted by user


