# Making the most of your colab subscription



## Faster GPUs

<p>Users who have purchased one of Colab's paid plans have access to faster GPUs and more memory. You can upgrade your notebook's GPU settings in <code>Runtime &gt; Change runtime type</code> in the menu to select from several accelerator options, subject to availability.</p>
<p>The free-of-charge version of Colab grants access to Nvidia's T4 GPUs subject to quota restrictions and availability.</p>

You can see what GPU you've been assigned at any time by executing the following cell. If the execution result of running the code cell below is 'Not connected to a GPU', you can change the runtime by going to <code>Runtime &gt; Change runtime type</code> in the menu to enable a GPU accelerator, and then re-execute the code cell.

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In order to use a GPU with your notebook, select the <code>Runtime &gt; Change runtime type</code> menu and then set the hardware accelerator to the desired option.

## More memory

Users who have purchased one of Colab's paid plans have access to high-memory VMs when they are available. More powerful GPUs are always offered with high-memory VMs.
You can see how much memory you have available at any time by running the following code cell. If the execution result of running the code cell below is 'Not using a high-RAM runtime', then you can enable a high-RAM runtime via <code>Runtime &gt; Change runtime type</code> in the menu. Then select High-RAM in the Runtime shape toggle button. After, re-execute the code cell.

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

## Longer runtimes

All Colab runtimes are reset after some period of time &#40;which is faster if the runtime isn't executing code&#41;. Colab Pro and Pro+ users have access to longer runtimes than those who use Colab free of charge.

## Background execution

Colab Pro+ users have access to background execution, where notebooks will continue executing even after you've closed a browser tab. This is always enabled in Pro+ runtimes as long as you have compute units available.


## Relaxing resource limits in Colab Pro

Your resources are not unlimited in Colab. To make the most of Colab, avoid using resources when you don't need them. For example, only use a GPU when required and close Colab tabs when finished.

If you encounter limitations, you can relax those limitations by purchasing more compute units via pay as you go. Anyone can purchase compute units via <a href="https://colab.research.google.com/signup">pay as you go</a>; no subscription is required.

## Send us feedback!

<p>If you have any feedback for us, please let us know. The best way to send feedback is by using the Help &gt; 'Send feedback‚Ä¶' menu. If you encounter usage limits in Colab Pro consider subscribing to Pro+.</p>
<p>If you encounter errors or other issues with billing &#40;payments&#41; for Colab Pro, Pro+ or pay as you go, please email <a href="mailto:colab-billing@google.com">colab-billing@google.com</a>.</p>

## More resources

### Working with notebooks in Colab
- [Overview of Colab](/notebooks/basic_features_overview.ipynb)
- [Guide to markdown](/notebooks/markdown_guide.ipynb)
- [Importing libraries and installing dependencies](/notebooks/snippets/importing_libraries.ipynb)
- [Saving and loading notebooks in GitHub](https://colab.research.google.com/github/googlecolab/colabtools/blob/main/notebooks/colab-github-demo.ipynb)
- [Interactive forms](/notebooks/forms.ipynb)
- [Interactive widgets](/notebooks/widgets.ipynb)

<a name="working-with-data"></a>
### Working with data
- [Loading data: Drive, Sheets and Google Cloud Storage](/notebooks/io.ipynb)
- [Charts: visualising data](/notebooks/charts.ipynb)
- [Getting started with BigQuery](/notebooks/bigquery.ipynb)

### Machine learning crash course
These are a few of the notebooks from Google's online machine learning course. See the <a href="https://developers.google.com/machine-learning/crash-course/">full course website</a> for more.
- [Intro to Pandas DataFrame](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/pandas_dataframe_ultraquick_tutorial.ipynb)
- [Linear regression with tf.keras using synthetic data](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/cc/exercises/linear_regression_with_synthetic_data.ipynb)


<a name="using-accelerated-hardware"></a>
### Using accelerated hardware
- [TensorFlow with GPUs](/notebooks/gpu.ipynb)
- [TPUs in Colab](/notebooks/tpu.ipynb)

<a name="machine-learning-examples"></a>

## Machine learning examples

To see end-to-end examples of the interactive machine learning analyses that Colab makes possible, take a look at these tutorials using models from <a href="https://tfhub.dev">TensorFlow Hub</a>.

A few featured examples:

- <a href="https://tensorflow.org/hub/tutorials/tf2_image_retraining">Retraining an Image Classifier</a>: Build a Keras model on top of a pre-trained image classifier to distinguish flowers.
- <a href="https://tensorflow.org/hub/tutorials/tf2_text_classification">Text Classification</a>: Classify IMDB film reviews as either <em>positive</em> or <em>negative</em>.
- <a href="https://tensorflow.org/hub/tutorials/tf2_arbitrary_image_stylization">Style Transfer</a>: Use deep learning to transfer style between images.
- <a href="https://tensorflow.org/hub/tutorials/retrieval_with_tf_hub_universal_encoder_qa">Multilingual Universal Sentence Encoder Q&amp;A</a>: Use a machine-learning model to answer questions from the SQuAD dataset.
- <a href="https://tensorflow.org/hub/tutorials/tweening_conv3d">Video Interpolation</a>: Predict what happened in a video between the first and the last frame.


In [None]:
%cd /content
!git clone https://github.com/huggingface/lerobot.git
%cd /content/lerobot
!pip install -e .


/content
Cloning into 'lerobot'...
remote: Enumerating objects: 41105, done.[K
remote: Counting objects: 100% (131/131), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 41105 (delta 104), reused 65 (delta 65), pack-reused 40974 (from 3)[K
Receiving objects: 100% (41105/41105), 201.76 MiB | 13.86 MiB/s, done.
Resolving deltas: 100% (26715/26715), done.
Filtering content: 100% (45/45), 69.03 MiB | 20.67 MiB/s, done.
/content/lerobot
Obtaining file:///content/lerobot
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting huggingface-hub<0.36.0,>=0.34.2 (from huggingface-hub[cli,hf-transfer]<0.36.0,>=0.34.2->lerobot==0.4.1)
  Downloading huggingface_hub-0.35.3-py3-none-any.whl.metadata (14 kB)
Collecting av<16.0.0,>=15.0.0 (from lerobot==0.4.1)
  Downlo

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
The token `SAWR` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `SAWR`


In [None]:
%cd /content/lerobot
!pip install -e ".[smolvla]"

/content/lerobot
Obtaining file:///content/lerobot
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting num2words<0.6.0,>=0.5.14 (from lerobot==0.4.1)
  Downloading num2words-0.5.14-py3-none-any.whl.metadata (13 kB)
Collecting docopt>=0.6.2 (from num2words<0.6.0,>=0.5.14->lerobot==0.4.1)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading num2words-0.5.14-py3-none-any.whl (163 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m163.5/163.5 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: lerobot, docopt
  Building editable for lerobot (pyproject.toml) ... [?25l[?25hdone
  Created whee

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# ======================================================
# CONFIG
# ======================================================
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_PATH = Path("/content/drive/MyDrive/Final_challenge/lerobot_output/smolvla_proper_split_final/best_model")
BATCH_SIZE = 32
device = torch.device("cuda")

print(f"Evaluating model from: {MODEL_PATH}")

# ======================================================
# LOAD MODEL & STATS
# ======================================================
print("\nLoading model...")
policy = SmolVLAPolicy.from_pretrained(MODEL_PATH)
policy.eval()
policy.to(device)

meta = LeRobotDatasetMetadata(DATASET_REPO)
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

# Get normalization parameters
action_stats = meta.stats['action']
action_min = torch.tensor(action_stats['min'])
action_max = torch.tensor(action_stats['max'])
action_mean = torch.tensor(action_stats.get('mean', [(mi + ma) / 2 for mi, ma in zip(action_min, action_max)]))

print(f"\n‚úì Model loaded (chunk_size: {policy.config.chunk_size})")
print(f"Action ranges:")
print(f"  Min: {action_min.numpy()}")
print(f"  Max: {action_max.numpy()}")

# ======================================================
# DENORMALIZATION FUNCTION
# ======================================================
def denormalize_actions(normalized_actions, action_min, action_max):
    """
    Denormalize from [-1, 1] to original scale
    normalized = (original - min) / (max - min) * 2 - 1
    original = (normalized + 1) / 2 * (max - min) + min
    """
    return (normalized_actions + 1) / 2 * (action_max - action_min) + action_min

# ======================================================
# LOAD VALIDATION DATASET
# ======================================================
print("\nLoading validation dataset...")

base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]

fps = meta.fps
action_horizon = policy.config.chunk_size
delta_timestamps = {
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": [i / fps for i in range(action_horizon)],
}

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_dataset = Subset(val_full, val_indices)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, pin_memory=True, num_workers=4)

print(f"‚úì Loaded {len(val_dataset)} validation samples")

# ======================================================
# HELPERS
# ======================================================
def fix_keys(batch):
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")
    if "observation.images.camera1" in batch and "observation.images.camera2" not in batch:
        batch["observation.images.camera2"] = torch.zeros_like(batch["observation.images.camera1"])
    if "observation.images.camera2" in batch and "observation.images.camera1" not in batch:
        batch["observation.images.camera1"] = torch.zeros_like(batch["observation.images.camera2"])
    for cam_key in ["observation.images.camera1", "observation.images.camera2"]:
        if cam_key in batch:
            img = batch[cam_key]
            if img.ndim == 5:
                batch[cam_key] = img[:, 0]
    return batch

def ensure_task(batch):
    if "task" not in batch:
        B = next(iter(batch.values())).shape[0]
        batch["task"] = ["Pick and place the object."] * B
    return batch

# ======================================================
# EVALUATION
# ======================================================
print("\n" + "="*70)
print("RUNNING EVALUATION")
print("="*70)

all_pred_denorm = []
all_gt_original = []

with torch.no_grad():
    for i, batch in enumerate(val_loader):
        actual_batch_size = batch['action'].shape[0]

        # Save ORIGINAL ground truth (denormalized)
        gt_original = batch['action'].clone()[:, 0, :]  # First action

        # Preprocess
        batch = fix_keys(batch)
        batch = ensure_task(batch)
        batch = preprocessor(batch)
        batch = {k: v.to(device) if torch.is_tensor(v) else v for k, v in batch.items()}

        # Get predictions (normalized)
        try:
            pred_normalized = policy.select_action(batch)
        except:
            output = policy.forward(batch)
            if isinstance(output, tuple):
                pred_normalized = output[1].get('action') or output[1].get('actions')
            else:
                pred_normalized = output.get('action') or output.get('actions')

        if pred_normalized is None:
            print(f"‚ùå Failed at batch {i}")
            break

        if pred_normalized.ndim == 3:
            pred_normalized = pred_normalized[:, 0, :]

        # Trim to actual batch size
        pred_normalized = pred_normalized[:actual_batch_size].cpu()
        gt_original = gt_original[:actual_batch_size]

        # Denormalize predictions manually
        pred_denorm = denormalize_actions(pred_normalized, action_min, action_max)

        all_pred_denorm.append(pred_denorm)
        all_gt_original.append(gt_original)

        if (i + 1) % 10 == 0:
            print(f"  Processed {i+1}/{len(val_loader)} batches")

# Concatenate
all_pred_denorm = torch.cat(all_pred_denorm)
all_gt_original = torch.cat(all_gt_original)

# Trim to exact size
total_expected = len(val_dataset)
all_pred_denorm = all_pred_denorm[:total_expected]
all_gt_original = all_gt_original[:total_expected]

print(f"\n‚úì Collected {len(all_pred_denorm)} predictions")
print(f"  Prediction shape: {all_pred_denorm.shape}")
print(f"  Ground truth shape: {all_gt_original.shape}")

# ======================================================
# CALCULATE METRICS
# ======================================================
print("\n" + "="*70)
print("SUCCESS METRICS")
print("="*70)

# Per-dimension errors
print("\nüìä PER-DIMENSION ERRORS (degrees/original units):")
for dim in range(all_pred_denorm.shape[1]):
    mae_dim = torch.abs(all_pred_denorm[:, dim] - all_gt_original[:, dim]).mean()
    mse_dim = ((all_pred_denorm[:, dim] - all_gt_original[:, dim]) ** 2).mean()
    print(f"  Dim {dim}: MAE = {mae_dim:.2f}, MSE = {mse_dim:.2f}")

# Overall stats
mae_overall = torch.abs(all_pred_denorm - all_gt_original).mean()
mse_overall = ((all_pred_denorm - all_gt_original) ** 2).mean()
print(f"\nüìä OVERALL:")
print(f"  MAE: {mae_overall:.2f}")
print(f"  MSE: {mse_overall:.2f}")

# Success rate: "within X degrees for ALL dimensions"
print("\nüìä SUCCESS RATE (all dimensions within threshold):")
for threshold in [1, 2, 5, 10, 20, 50]:
    within_thresh = (torch.abs(all_pred_denorm - all_gt_original) <= threshold).all(dim=1)
    success = within_thresh.float().mean() * 100
    marker = "  ‚Üê YOUR METRIC" if threshold == 5 else ""
    marker2 = "  ‚Üê PASS!" if success >= 50 else ""
    print(f"  Within {threshold:2d}¬∞: {success:.2f}%{marker}{marker2}")

# Per-dimension success (at least shows which joints work)
print("\nüìä PER-DIMENSION SUCCESS (within 5¬∞):")
within_5 = torch.abs(all_pred_denorm - all_gt_original) <= 5.0
success_per_dim = within_5.float().mean(dim=0) * 100
for dim in range(len(success_per_dim)):
    joint_name = f"Joint {dim}" if dim < 5 else "Gripper"
    print(f"  {joint_name}: {success_per_dim[dim]:.2f}%")

# Final verdict
print("\n" + "="*70)
print("FINAL RESULTS")
print("="*70)

# Your original metric: all dims within 5 degrees
all_within_5 = (torch.abs(all_pred_denorm - all_gt_original) <= 5.0).all(dim=1)
final_success = all_within_5.float().mean() * 100

print(f"\nüéØ SUCCESS RATE (all 6 dimensions within 5¬∞): {final_success:.2f}%")

if final_success >= 50:
    print(f"‚úÖ PASS! Model achieves {final_success:.2f}% (requirement: >50%)")
else:
    print(f"‚ùå FAIL. Model achieves {final_success:.2f}% (requirement: >50%)")

    # Find what threshold gives 50%
    for test_thresh in [5, 10, 15, 20, 25, 30, 40, 50]:
        within = (torch.abs(all_pred_denorm - all_gt_original) <= test_thresh).all(dim=1)
        succ = within.float().mean() * 100
        if succ >= 50:
            print(f"\nüí° Alternative: With threshold {test_thresh}¬∞, success = {succ:.2f}%")
            break

print("\n" + "="*70)
print("EVALUATION COMPLETE")
print("="*70)

Evaluating model from: /content/drive/MyDrive/Final_challenge/lerobot_output/smolvla_proper_split_final/best_model

Loading model...
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.03G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/136 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/67.0 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/430 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/868 [00:00<?, ?B/s]

Reducing the number of VLM layers to 16 ...
Loading weights from local directory


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

info.json: 0.00B [00:00, ?B/s]

stats.json: 0.00B [00:00, ?B/s]

meta/episodes/chunk-000/file-000.parquet:   0%|          | 0.00/72.6k [00:00<?, ?B/s]

meta/tasks.parquet:   0%|          | 0.00/2.25k [00:00<?, ?B/s]


‚úì Model loaded (chunk_size: 50)
Action ranges:
  Min: [ -93.45587921 -100.           12.97223473   33.53166199  -92.77167511
    0.        ]
  Max: [ 88.01470947   8.12631607 100.          99.49001312 -20.
  32.99845505]

Loading validation dataset...


Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

.gitattributes: 0.00B [00:00, ?B/s]

videos/observation.images.up/chunk-000/f(‚Ä¶):   0%|          | 0.00/40.1M [00:00<?, ?B/s]

videos/observation.images.side/chunk-000(‚Ä¶):   0%|          | 0.00/45.5M [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

data/chunk-000/file-000.parquet:   0%|          | 0.00/370k [00:00<?, ?B/s]

‚úì Loaded 2759 validation samples

RUNNING EVALUATION




  Processed 10/87 batches
  Processed 20/87 batches
  Processed 30/87 batches
  Processed 40/87 batches
  Processed 50/87 batches
  Processed 60/87 batches
  Processed 70/87 batches
  Processed 80/87 batches

‚úì Collected 2759 predictions
  Prediction shape: torch.Size([2759, 6])
  Ground truth shape: torch.Size([2759, 6])

SUCCESS METRICS

üìä PER-DIMENSION ERRORS (degrees/original units):
  Dim 0: MAE = 49.30, MSE = 3178.79
  Dim 1: MAE = 49.76, MSE = 3585.71
  Dim 2: MAE = 39.77, MSE = 2334.55
  Dim 3: MAE = 20.26, MSE = 853.21
  Dim 4: MAE = 21.30, MSE = 704.02
  Dim 5: MAE = 7.12, MSE = 68.19

üìä OVERALL:
  MAE: 31.25
  MSE: 1787.41

üìä SUCCESS RATE (all dimensions within threshold):
  Within  1¬∞: 0.00%
  Within  2¬∞: 0.00%
  Within  5¬∞: 0.00%  ‚Üê YOUR METRIC
  Within 10¬∞: 0.18%
  Within 20¬∞: 10.51%
  Within 50¬∞: 32.15%

üìä PER-DIMENSION SUCCESS (within 5¬∞):
  Joint 0: 1.63%
  Joint 1: 1.70%
  Joint 2: 2.57%
  Joint 3: 22.94%
  Joint 4: 13.70%
  Gripper: 21.64%

FIN

In [None]:
!pip install num2words

Collecting num2words
  Downloading num2words-0.5.14-py3-none-any.whl.metadata (13 kB)
Collecting docopt>=0.6.2 (from num2words)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading num2words-0.5.14-py3-none-any.whl (163 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m163.5/163.5 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: docopt
  Building wheel for docopt (setup.py) ... [?25l[?25hdone
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=711e6d81aa711280aa52634b275d3bab17fc41b2b4e4ca861c5ba60780122f21
  Stored in directory: /root/.cache/pip/wheels/1a/bf/a1/4cee4f7678c68c5875ca89eaccf460593539805c3906722228
Successfully built docopt
Installing collected packages: docopt, num2words
Successfully installed docopt-0.6.2 num2words-0.5.14


In [None]:
import torch
from lerobot.datasets.lerobot_dataset import LeRobotDataset
import numpy as np

# Check action ranges in dataset
base_ds = LeRobotDataset("lerobot/svla_so101_pickplace", video_backend="pyav")

# Get all actions
all_actions = base_ds.hf_dataset['action']
actions_array = np.array(all_actions)

print("Action statistics from dataset:")
print(f"  Min: {actions_array.min(axis=0)}")
print(f"  Max: {actions_array.max(axis=0)}")
print(f"  Mean: {actions_array.mean(axis=0)}")
print(f"  Std: {actions_array.std(axis=0)}")

# Check if normalized
if actions_array.min() >= -1.1 and actions_array.max() <= 1.1:
    print("\n‚úì Actions appear to be NORMALIZED [-1, 1]")
else:
    print("\n‚úì Actions are in ORIGINAL SCALE (degrees/radians)")

Action statistics from dataset:
  Min: [ -93.45588  -100.         12.972235   33.531662  -92.771675    0.      ]
  Max: [ 88.01471    8.126316 100.        99.49001  -20.        32.998455]
  Mean: [  8.021016 -55.962566  65.2547    69.18142  -53.420696   6.848858]
  Std: [44.562996 36.484978 29.012281 13.238107 17.764414  8.999029]

‚úì Actions are in ORIGINAL SCALE (degrees/radians)


In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# ======================================================
# CONFIG
# ======================================================
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_PATH = Path("/content/drive/MyDrive/Final_challenge/lerobot_output/smolvla_proper_split_final/best_model")
BATCH_SIZE = 24
device = torch.device("cuda")

print(f"Evaluating model from: {MODEL_PATH}")

# ======================================================
# LOAD MODEL & STATS
# ======================================================
print("\nLoading model...")
policy = SmolVLAPolicy.from_pretrained(MODEL_PATH)
policy.eval()
policy.to(device)

meta = LeRobotDatasetMetadata(DATASET_REPO)
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

# Get normalization parameters
action_stats = meta.stats['action']
action_min = torch.tensor(action_stats['min'])
action_max = torch.tensor(action_stats['max'])
action_mean = torch.tensor(action_stats.get('mean', [(mi + ma) / 2 for mi, ma in zip(action_min, action_max)]))

print(f"\n‚úì Model loaded (chunk_size: {policy.config.chunk_size})")
print(f"Action ranges:")
print(f"  Min: {action_min.numpy()}")
print(f"  Max: {action_max.numpy()}")

# ======================================================
# DENORMALIZATION FUNCTION
# ======================================================
def denormalize_actions(normalized_actions, action_min, action_max):
    """
    Denormalize from [-1, 1] to original scale
    normalized = (original - min) / (max - min) * 2 - 1
    original = (normalized + 1) / 2 * (max - min) + min
    """
    return (normalized_actions + 1) / 2 * (action_max - action_min) + action_min

# ======================================================
# LOAD VALIDATION DATASET
# ======================================================
print("\nLoading validation dataset...")

base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]

fps = meta.fps
action_horizon = policy.config.chunk_size
delta_timestamps = {
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": [i / fps for i in range(action_horizon)],
}

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_dataset = Subset(val_full, val_indices)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, pin_memory=True, num_workers=4)

print(f"‚úì Loaded {len(val_dataset)} validation samples")

# ======================================================
# HELPERS
# ======================================================
def fix_keys(batch):
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")
    if "observation.images.camera1" in batch and "observation.images.camera2" not in batch:
        batch["observation.images.camera2"] = torch.zeros_like(batch["observation.images.camera1"])
    if "observation.images.camera2" in batch and "observation.images.camera1" not in batch:
        batch["observation.images.camera1"] = torch.zeros_like(batch["observation.images.camera2"])
    for cam_key in ["observation.images.camera1", "observation.images.camera2"]:
        if cam_key in batch:
            img = batch[cam_key]
            if img.ndim == 5:
                batch[cam_key] = img[:, 0]
    return batch

def ensure_task(batch):
    if "task" not in batch:
        B = next(iter(batch.values())).shape[0]
        batch["task"] = ["Pick and place the object."] * B
    return batch

# ======================================================
# EVALUATION
# ======================================================
print("\n" + "="*70)
print("RUNNING EVALUATION")
print("="*70)

all_pred_denorm = []
all_gt_original = []

with torch.no_grad():
    for i, batch in enumerate(val_loader):
        actual_batch_size = batch['action'].shape[0]

        # Save ORIGINAL ground truth (denormalized)
        gt_original = batch['action'].clone()[:, 0, :]  # First action

        # Preprocess
        batch = fix_keys(batch)
        batch = ensure_task(batch)
        batch = preprocessor(batch)
        batch = {k: v.to(device) if torch.is_tensor(v) else v for k, v in batch.items()}

        # Get predictions (normalized)
        try:
            pred_normalized = policy.select_action(batch)
        except:
            output = policy.forward(batch)
            if isinstance(output, tuple):
                pred_normalized = output[1].get('action') or output[1].get('actions')
            else:
                pred_normalized = output.get('action') or output.get('actions')

        if pred_normalized is None:
            print(f"‚ùå Failed at batch {i}")
            break

        if pred_normalized.ndim == 3:
            pred_normalized = pred_normalized[:, 0, :]

        # Trim to actual batch size
        pred_normalized = pred_normalized[:actual_batch_size].cpu()
        gt_original = gt_original[:actual_batch_size]

        # Denormalize predictions manually
        pred_denorm = denormalize_actions(pred_normalized, action_min, action_max)

        all_pred_denorm.append(pred_denorm)
        all_gt_original.append(gt_original)

        if (i + 1) % 10 == 0:
            print(f"  Processed {i+1}/{len(val_loader)} batches")

# Concatenate
all_pred_denorm = torch.cat(all_pred_denorm)
all_gt_original = torch.cat(all_gt_original)

# Trim to exact size
total_expected = len(val_dataset)
all_pred_denorm = all_pred_denorm[:total_expected]
all_gt_original = all_gt_original[:total_expected]

print(f"\n‚úì Collected {len(all_pred_denorm)} predictions")
print(f"  Prediction shape: {all_pred_denorm.shape}")
print(f"  Ground truth shape: {all_gt_original.shape}")

# ======================================================
# CALCULATE METRICS
# ======================================================
print("\n" + "="*70)
print("SUCCESS METRICS")
print("="*70)

# Per-dimension errors
print("\nüìä PER-DIMENSION ERRORS (degrees/original units):")
for dim in range(all_pred_denorm.shape[1]):
    mae_dim = torch.abs(all_pred_denorm[:, dim] - all_gt_original[:, dim]).mean()
    mse_dim = ((all_pred_denorm[:, dim] - all_gt_original[:, dim]) ** 2).mean()
    print(f"  Dim {dim}: MAE = {mae_dim:.2f}, MSE = {mse_dim:.2f}")

# Overall stats
mae_overall = torch.abs(all_pred_denorm - all_gt_original).mean()
mse_overall = ((all_pred_denorm - all_gt_original) ** 2).mean()
print(f"\nüìä OVERALL:")
print(f"  MAE: {mae_overall:.2f}")
print(f"  MSE: {mse_overall:.2f}")

# Success rate: "within X degrees for ALL dimensions"
print("\nüìä SUCCESS RATE (all dimensions within threshold):")
for threshold in [1, 2, 5, 10, 20, 50]:
    within_thresh = (torch.abs(all_pred_denorm - all_gt_original) <= threshold).all(dim=1)
    success = within_thresh.float().mean() * 100
    marker = "  ‚Üê YOUR METRIC" if threshold == 5 else ""
    marker2 = "  ‚Üê PASS!" if success >= 50 else ""
    print(f"  Within {threshold:2d}¬∞: {success:.2f}%{marker}{marker2}")

# Per-dimension success (at least shows which joints work)
print("\nüìä PER-DIMENSION SUCCESS (within 5¬∞):")
within_5 = torch.abs(all_pred_denorm - all_gt_original) <= 5.0
success_per_dim = within_5.float().mean(dim=0) * 100
for dim in range(len(success_per_dim)):
    joint_name = f"Joint {dim}" if dim < 5 else "Gripper"
    print(f"  {joint_name}: {success_per_dim[dim]:.2f}%")

# Final verdict
print("\n" + "="*70)
print("FINAL RESULTS")
print("="*70)

# Your original metric: all dims within 5 degrees
all_within_5 = (torch.abs(all_pred_denorm - all_gt_original) <= 5.0).all(dim=1)
final_success = all_within_5.float().mean() * 100

print(f"\nüéØ SUCCESS RATE (all 6 dimensions within 5¬∞): {final_success:.2f}%")

if final_success >= 50:
    print(f"‚úÖ PASS! Model achieves {final_success:.2f}% (requirement: >50%)")
else:
    print(f"‚ùå FAIL. Model achieves {final_success:.2f}% (requirement: >50%)")

    # Find what threshold gives 50%
    for test_thresh in [5, 10, 15, 20, 25, 30, 40, 50]:
        within = (torch.abs(all_pred_denorm - all_gt_original) <= test_thresh).all(dim=1)
        succ = within.float().mean() * 100
        if succ >= 50:
            print(f"\nüí° Alternative: With threshold {test_thresh}¬∞, success = {succ:.2f}%")
            break

print("\n" + "="*70)
print("EVALUATION COMPLETE")
print("="*70)

Evaluating model from: /content/drive/MyDrive/Final_challenge/lerobot_output/smolvla_proper_split_final/best_model

Loading model...
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...
Reducing the number of VLM layers to 16 ...
Loading weights from local directory

‚úì Model loaded (chunk_size: 50)
Action ranges:
  Min: [ -93.45587921 -100.           12.97223473   33.53166199  -92.77167511
    0.        ]
  Max: [ 88.01470947   8.12631607 100.          99.49001312 -20.
  32.99845505]

Loading validation dataset...
‚úì Loaded 2759 validation samples

RUNNING EVALUATION




  Processed 10/115 batches
  Processed 20/115 batches
  Processed 30/115 batches
  Processed 40/115 batches
  Processed 50/115 batches
  Processed 60/115 batches
  Processed 70/115 batches
  Processed 80/115 batches
  Processed 90/115 batches
  Processed 100/115 batches
  Processed 110/115 batches

‚úì Collected 2759 predictions
  Prediction shape: torch.Size([2759, 6])
  Ground truth shape: torch.Size([2759, 6])

SUCCESS METRICS

üìä PER-DIMENSION ERRORS (degrees/original units):
  Dim 0: MAE = 92.79, MSE = 12079.82
  Dim 1: MAE = 53.39, MSE = 4122.37
  Dim 2: MAE = 38.75, MSE = 2219.23
  Dim 3: MAE = 20.81, MSE = 623.09
  Dim 4: MAE = 44.29, MSE = 2818.26
  Dim 5: MAE = 15.69, MSE = 440.86

üìä OVERALL:
  MAE: 44.28
  MSE: 3717.27

üìä SUCCESS RATE (all dimensions within threshold):
  Within  1¬∞: 0.00%
  Within  2¬∞: 0.00%
  Within  5¬∞: 0.00%  ‚Üê YOUR METRIC
  Within 10¬∞: 0.04%
  Within 20¬∞: 3.01%
  Within 50¬∞: 18.67%

üìä PER-DIMENSION SUCCESS (within 5¬∞):
  Joint 0: 1.70

In [None]:
!python -m lerobot.scripts.lerobot_edit_dataset \
        --repo_id lerobot/svla_so101_pickplace \
        --operation.type split \
        --operation.splits '{"train": 0.8, "val": 0.2}' \
        --push_to_hub "Sa74ll/sso101_pickplace"


Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/content/lerobot/src/lerobot/scripts/lerobot_edit_dataset.py", line 286, in <module>
    main()
  File "/content/lerobot/src/lerobot/scripts/lerobot_edit_dataset.py", line 282, in main
    edit_dataset()
  File "/content/lerobot/src/lerobot/configs/parser.py", line 232, in wrapper_inner
    cfg = draccus.parse(config_class=argtype, config_path=config_path, args=cli_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/draccus/argparsing.py", line 211, in parse
    return parser.parse_args(args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/draccus/argparsing.py", line 102, in parse_args
    args, _ = self.parse_known_args(args, namespace, is_parse_args=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [None]:
!python -m lerobot.scripts.lerobot_edit_dataset \
        --repo_id lerobot/svla_so101_pickplace \
        --operation.type split \
        --operation.splits '{"train": [0, 1, 2, 3, 4,5,6,7,8,9,10,11,12,13], "val": [14, 15]}'
        --torch.backends


INFO 2025-11-05 07:05:32 _dataset.py:184 Splitting dataset lerobot/svla_so101_pickplace with splits: {'train': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 'val': [14, 15]}
INFO 2025-11-05 07:05:32 et_tools.py:190 Creating split 'train' with 14 episodes
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/content/lerobot/src/lerobot/scripts/lerobot_edit_dataset.py", line 286, in <module>
    main()
  File "/content/lerobot/src/lerobot/scripts/lerobot_edit_dataset.py", line 282, in main
    edit_dataset()
  File "/content/lerobot/src/lerobot/configs/parser.py", line 233, in wrapper_inner
    response = fn(cfg, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/content/lerobot/src/lerobot/scripts/lerobot_edit_dataset.py", line 268, in edit_dataset
    handle_split(cfg)
  File "/content/lerobot/src/lerobot/scripts/lerobot_edit_dataset.py", line 185, in handle_split
    split_d

In [None]:
i = 2
fps = 30
s = i*fps / 0
print(s)

ZeroDivisionError: division by zero

In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# -------------------------------------------------
# CONFIG
# -------------------------------------------------
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_PATH = Path("/content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1")
BATCH_SIZE = 24
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(f"Loading policy from: {MODEL_PATH}")
policy = SmolVLAPolicy.from_pretrained(MODEL_PATH)
policy.to(device)
policy.eval()

# we need fps + stats again
meta = LeRobotDatasetMetadata(DATASET_REPO)
fps = meta.fps
chunk_size = policy.config.chunk_size
print("chunk_size:", chunk_size)
print("fps:", fps)

# same preprocessor as training ‚Üí same normalization
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

# -------------------------------------------------
# BUILD VAL DATASET (episodes 40-49)
# -------------------------------------------------
base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]

# timestamps must MATCH the model
delta_timestamps = {
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": [i / fps for i in range(chunk_size)],
}

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_ds = Subset(val_full, val_indices)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)

print(f"val samples: {len(val_ds)}")


# -------------------------------------------------
# helpers ‚Äì SAME as training
# -------------------------------------------------
def fix_keys(batch: dict) -> dict:
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")

    # pad missing camera
    if "observation.images.camera1" in batch and "observation.images.camera2" not in batch:
        batch["observation.images.camera2"] = torch.zeros_like(batch["observation.images.camera1"])
    if "observation.images.camera2" in batch and "observation.images.camera1" not in batch:
        batch["observation.images.camera1"] = torch.zeros_like(batch["observation.images.camera2"])

    # sometimes (B,1,C,H,W)
    for cam_key in ["observation.images.camera1", "observation.images.camera2"]:
        if cam_key in batch and batch[cam_key].ndim == 5:
            batch[cam_key] = batch[cam_key][:, 0]

    return batch


def ensure_task(batch: dict) -> dict:
    if "task" not in batch:
        # infer batch size from any tensor
        bs = None
        for v in batch.values():
            if torch.is_tensor(v):
                bs = v.shape[0]
                break
        if bs is None:
            bs = 1
        batch["task"] = ["Pick and place the object."] * bs
    return batch


# -------------------------------------------------
# EVAL LOOP
# -------------------------------------------------
all_pred = []
all_gt = []

policy.eval()
with torch.no_grad():
    for i, raw_batch in enumerate(val_loader):
        # 1) same fixes as training
        raw_batch = fix_keys(raw_batch)
        raw_batch = ensure_task(raw_batch)

        # 2) run through preprocessor ‚Üí this normalizes actions
        proc_batch = preprocessor(raw_batch)

        # 3) move to device
        for k, v in list(proc_batch.items()):
            if torch.is_tensor(v):
                proc_batch[k] = v.to(device)

        # 4) predict a full chunk (B, T, A) WITHOUT changing the model file
        pred_actions = policy.predict_action_chunk(proc_batch)  # (B, chunk, action_dim)

        # 5) get GT actions from preprocessed batch (same normalization)
        gt_actions = proc_batch["action"]  # (B, chunk, action_dim)

        # we can compare only the first action (T=0)
        pred_first = pred_actions[:, 0, :].cpu()
        gt_first = gt_actions[:, 0, :].cpu()

        all_pred.append(pred_first)
        all_gt.append(gt_first)

        if (i + 1) % 20 == 0:
            print(f"processed {i+1}/{len(val_loader)} batches")

# cat
all_pred = torch.cat(all_pred, dim=0)
all_gt = torch.cat(all_gt, dim=0)

print("pred shape:", all_pred.shape)
print("gt shape:", all_gt.shape)

# -------------------------------------------------
# METRICS (same as before)
# -------------------------------------------------
abs_err = torch.abs(all_pred - all_gt)
mae = abs_err.mean()
mse = ((all_pred - all_gt) ** 2).mean()
print(f"\nMAE (normalized): {mae:.4f}")
print(f"MSE (normalized): {mse:.4f}")

# success rate in normalized space: threshold = 0.05 (~5% of range)
print("\nSuccess rate (all dims within 0.1 in normalized space):")
within = (abs_err <= 0.1).all(dim=1)
success = within.float().mean() * 100
print(f"  {success:.2f}%")


Loading policy from: /content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...
Reducing the number of VLM layers to 16 ...
Loading weights from local directory
chunk_size: 50
fps: 30
val samples: 2759




processed 20/115 batches
processed 40/115 batches
processed 60/115 batches
processed 80/115 batches
processed 100/115 batches
pred shape: torch.Size([2759, 6])
gt shape: torch.Size([2759, 6])

MAE (normalized): 0.2042
MSE (normalized): 0.0869

Success rate (all dims within 0.1 in normalized space):
  0.72%


In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# -----------------------------------------------------
# CONFIG
# -----------------------------------------------------
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_DIR = Path("/content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1")
BATCH_SIZE = 24
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(f"Loading policy from: {MODEL_DIR}")
policy = SmolVLAPolicy.from_pretrained(MODEL_DIR)
policy.to(device).eval()

meta = LeRobotDatasetMetadata(DATASET_REPO)
fps = meta.fps
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

# üëá this time we read mean/std, not only min/max
act_stats = meta.stats["action"]
act_mean = torch.tensor(act_stats["mean"], dtype=torch.float32)
act_std  = torch.tensor(act_stats["std"], dtype=torch.float32)
act_min  = torch.tensor(act_stats["min"], dtype=torch.float32)
act_max  = torch.tensor(act_stats["max"], dtype=torch.float32)

print("chunk_size (action horizon):", policy.config.chunk_size)
print("fps:", fps)
print("action mean:", act_mean)
print("action std :", act_std)
print("val split: episodes >= 40")

# -----------------------------------------------------
# BUILD VAL DATASET (with correct timestamps)
# -----------------------------------------------------
base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]

action_horizon = policy.config.chunk_size
delta_timestamps = {
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": [i / fps for i in range(action_horizon)],
}

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_ds = Subset(val_full, val_indices)

val_loader = DataLoader(
    val_ds,
    batch_size=BATCH_SIZE,
    shuffle=False,
    pin_memory=True,
    num_workers=4,
)

print(f"val samples: {len(val_ds)}")

# -----------------------------------------------------
# HELPERS
# -----------------------------------------------------
def fix_keys(batch: dict) -> dict:
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")

    # pad missing camera
    if "observation.images.camera1" in batch and "observation.images.camera2" not in batch:
        batch["observation.images.camera2"] = torch.zeros_like(batch["observation.images.camera1"])
    if "observation.images.camera2" in batch and "observation.images.camera1" not in batch:
        batch["observation.images.camera1"] = torch.zeros_like(batch["observation.images.camera2"])

    # drop extra time dim
    for cam in ["observation.images.camera1", "observation.images.camera2"]:
        if cam in batch and batch[cam].ndim == 5:
            batch[cam] = batch[cam][:, 0]
    return batch

def ensure_task(batch: dict) -> dict:
    if "task" not in batch:
        B = next(v.shape[0] for v in batch.values() if torch.is_tensor(v))
        batch["task"] = ["Pick and place the object."] * B
    return batch

def unnormalize_from_z(pred_norm: torch.Tensor) -> torch.Tensor:
    """
    inverse of: x_norm = (x - mean) / std
    """
    mean = act_mean.to(pred_norm.device)
    std = act_std.to(pred_norm.device)
    return pred_norm * std + mean

# -----------------------------------------------------
# EVAL LOOP
# -----------------------------------------------------
all_preds = []
all_gts = []

print("\nStarting offline eval (z-score unnorm, 5% per-joint)...")

with torch.no_grad():
    for i, raw in enumerate(val_loader):
        # GT: original scale
        gt = raw["action"][:, 0, :]   # (B, D)

        raw = fix_keys(raw)
        raw = ensure_task(raw)
        batch = preprocessor(raw)
        batch = {k: (v.to(device) if torch.is_tensor(v) else v) for k, v in batch.items()}

        pred_norm = policy.select_action(batch)   # (B, D) in normalized space
        # align sizes
        Bact = gt.shape[0]
        pred_norm = pred_norm[:Bact].cpu()
        gt = gt[:Bact].cpu()

        # proper unnorm
        pred = unnormalize_from_z(pred_norm)

        all_preds.append(pred)
        all_gts.append(gt)

        if (i + 1) % 20 == 0:
            print(f"  processed {i+1}/{len(val_loader)} batches")

all_preds = torch.cat(all_preds, dim=0)
all_gts   = torch.cat(all_gts,   dim=0)

N = min(all_preds.shape[0], all_gts.shape[0])
all_preds = all_preds[:N]
all_gts   = all_gts[:N]

print(f"\nCollected predictions: {all_preds.shape}, GT: {all_gts.shape}")

# -----------------------------------------------------
# METRICS
# -----------------------------------------------------
mae = torch.abs(all_preds - all_gts).mean()
mse = ((all_preds - all_gts) ** 2).mean()
print(f"\nMAE (overall): {mae:.4f}")
print(f"MSE (overall): {mse:.4f}")

# 5% of joint range
joint_range = (act_max - act_min).to(all_preds.device)
per_joint_tol = 0.05 * joint_range   # (D,)

abs_err = torch.abs(all_preds - all_gts)
within_5pct = abs_err <= per_joint_tol

strict = within_5pct.all(dim=1).float().mean() * 100
print(f"\nüéØ strict success (all joints within 5%): {strict:.2f}%")

print("\nPer-joint 5% success:")
per_joint = within_5pct.float().mean(dim=0) * 100
for j, s in enumerate(per_joint):
    print(f"  joint {j}: {s:.2f}%")

print("\nDone.")


Loading policy from: /content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...
Reducing the number of VLM layers to 16 ...
Loading weights from local directory
chunk_size (action horizon): 50
fps: 30
action mean: tensor([  8.0211, -55.9624,  65.2557,  69.1819, -53.4199,   6.8489])
action std : tensor([44.5630, 36.4851, 29.0120, 13.2381, 17.7644,  8.9990])
val split: episodes >= 40
val samples: 2759

Starting offline eval (z-score unnorm, 5% per-joint)...




  processed 20/115 batches
  processed 40/115 batches
  processed 60/115 batches
  processed 80/115 batches
  processed 100/115 batches

Collected predictions: torch.Size([2759, 6]), GT: torch.Size([2759, 6])

MAE (overall): 26.9145
MSE (overall): 1468.2327

üéØ strict success (all joints within 5%): 1.63%

Per-joint 5% success:
  joint 0: 5.94%
  joint 1: 16.17%
  joint 2: 21.64%
  joint 3: 27.11%
  joint 4: 14.10%
  joint 5: 46.03%

Done.


In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

# ---- LeRobot stuff (same as training) ----
from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# =====================================================
# CONFIG
# =====================================================
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_DIR = "/content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1"  # your best
BATCH_SIZE = 24
MAX_VAL_BATCHES = None   # set to e.g. 50 to make it faster

# how many future actions to evaluate (model was trained on 50, but we don't need all 50)
K_EVAL = 10   # <-- change this if you like

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Loading policy from: {MODEL_DIR}")

# =====================================================
# 1) LOAD MODEL + METADATA
#    (from lerobot/policies/smolvla/modeling_smolvla.py)
#    we MUST load the same policy we trained
# =====================================================
policy = SmolVLAPolicy.from_pretrained(MODEL_DIR)
policy.to(device).eval()

# we need dataset stats to (re)build the same preprocessor
meta = LeRobotDatasetMetadata(DATASET_REPO)
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

# we'll do z-score unnorm because that's what LeRobot uses inside the processor
action_stats = meta.stats["action"]
action_mean = torch.tensor(action_stats["mean"])  # shape (6,)
action_std  = torch.tensor(action_stats["std"])   # shape (6,)

print(f"chunk_size (action horizon): {policy.config.chunk_size}")
fps = meta.fps
print("fps:", fps)

# =====================================================
# 2) REBUILD THE SAME delta_timestamps AS TRAINING
#    this was the thing that fixed the 227 vs 178 token bug
# =====================================================
action_horizon = policy.config.chunk_size
action_dts = [i / fps for i in range(action_horizon)]
delta_timestamps = {
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": action_dts,
}

# =====================================================
# 3) LOAD VAL SPLIT (episodes >= 40)
#    same logic as you trained with
# =====================================================
base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]
print("val split: episodes >= 40")
print("val samples:", len(val_indices))

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_ds = Subset(val_full, val_indices)
val_loader = DataLoader(
    val_ds,
    batch_size=BATCH_SIZE,
    shuffle=False,
    pin_memory=True,
    num_workers=4,  # keep 0 so video issues show up
)

# =====================================================
# 4) HELPERS (same as train)
# =====================================================
def fix_keys(batch: dict) -> dict:
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")

    # pad missing cam
    if "observation.images.camera1" in batch and "observation.images.camera2" not in batch:
        batch["observation.images.camera2"] = torch.zeros_like(batch["observation.images.camera1"])
    if "observation.images.camera2" in batch and "observation.images.camera1" not in batch:
        batch["observation.images.camera1"] = torch.zeros_like(batch["observation.images.camera2"])

    # squeeze time dim if (B,1,C,H,W)
    for cam_key in ["observation.images.camera1", "observation.images.camera2"]:
        if cam_key in batch:
            img = batch[cam_key]
            if img.ndim == 5:
                batch[cam_key] = img[:, 0]
    return batch

def ensure_task(batch: dict) -> dict:
    if "task" not in batch:
        # find batch size from a tensor
        bsz = None
        for v in batch.values():
            if torch.is_tensor(v):
                bsz = v.shape[0]
                break
        if bsz is None:
            bsz = 1
        batch["task"] = ["Pick and place the object."] * bsz
    return batch

# =====================================================
# 5) EVAL LOOP
#    main idea:
#    - get GT actions (B, 50, 6) from dataset
#    - run policy.predict_action_chunk(...) ‚Üí (B, 50, 6)
#      (this is in the model file you pasted)
#    - compare only first K_EVAL steps
# =====================================================
print("\nStarting offline eval (z-score unnorm, 5% per-joint)...\n")

all_preds = []
all_gts   = []

with torch.no_grad():
    for bi, batch in enumerate(val_loader):
        # original GT in dataset space (already 50 steps because of delta_timestamps)
        gt_actions = batch["action"]  # (B, 50, 6)

        # we will compare only first K_EVAL
        T = min(K_EVAL, gt_actions.shape[1], action_horizon)

        # fix + preprocess
        batch = fix_keys(batch)
        batch = ensure_task(batch)
        proc_batch = preprocessor(batch)
        proc_batch = {k: (v.to(device) if torch.is_tensor(v) else v) for k, v in proc_batch.items()}

        # ---- the important line: use policy.predict_action_chunk ----
        # defined in lerobot/policies/smolvla/modeling_smolvla.py
        pred_actions_norm = policy.predict_action_chunk(proc_batch)  # (B, 50, 6) in *normalized* space

        # move to cpu
        pred_actions_norm = pred_actions_norm[:, :T, :].cpu()
        gt_actions        = gt_actions[:, :T, :].cpu()

        # unnormalize using z-score: x = x * std + mean
        # (this is how the training preprocessor normalized actions)
        pred_actions = pred_actions_norm * action_std + action_mean

        all_preds.append(pred_actions)
        all_gts.append(gt_actions)

        if MAX_VAL_BATCHES is not None and (bi + 1) >= MAX_VAL_BATCHES:
            break

        if (bi + 1) % 20 == 0:
            print(f"  processed {bi+1}/{len(val_loader)} batches")

# stack ‚Üí shapes: (N, T, 6)
all_preds = torch.cat(all_preds, dim=0)
all_gts   = torch.cat(all_gts, dim=0)

print(f"\nCollected predictions: {all_preds.shape}, GT: {all_gts.shape}")

# =====================================================
# 6) METRICS
#    - MAE / MSE over all (N,T,6)
#    - per-joint success using 5% of that joint's range
#    - strict success = all joints within 5% for all T steps
# =====================================================

# get per-joint ranges from metadata (same place we got mean/std)
act_min = torch.tensor(meta.stats["action"]["min"])
act_max = torch.tensor(meta.stats["action"]["max"])
joint_ranges = act_max - act_min
tol_5pct = joint_ranges * 0.05  # (6,)

# overall errors
mae = (all_preds - all_gts).abs().mean()
mse = ((all_preds - all_gts) ** 2).mean()
print(f"\nMAE (overall): {mae:.4f}")
print(f"MSE (overall): {mse:.4f}")

# per-joint 5% success (averaged over time & samples)
# abs error: (N, T, 6)
abs_err = (all_preds - all_gts).abs()

# broadcast tol: (1,1,6)
within_5pct = abs_err <= tol_5pct.view(1, 1, -1)

per_joint_success = within_5pct.float().mean(dim=(0,1)) * 100.0

print("\nPer-joint 5% success:")
for j, s in enumerate(per_joint_success.tolist()):
    print(f"  joint {j}: {s:.2f}%")

# strict: a sample counts as success if for *every* timestep in 0..T-1 and *every* joint, we are inside 5%
strict_mask = within_5pct.all(dim=2).all(dim=1)  # (N,)
strict_success = strict_mask.float().mean() * 100.0
print(f"\nüéØ strict success (all {T} steps, all joints within 5%): {strict_success:.2f}%")

print("\nDone.")


Loading policy from: /content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...
Reducing the number of VLM layers to 16 ...
Loading weights from local directory
chunk_size (action horizon): 50
fps: 30
val split: episodes >= 40
val samples: 2759

Starting offline eval (z-score unnorm, 5% per-joint)...





  processed 20/115 batches
  processed 40/115 batches
  processed 60/115 batches
  processed 80/115 batches
  processed 100/115 batches

Collected predictions: torch.Size([2759, 10, 6]), GT: torch.Size([2759, 10, 6])

MAE (overall): 6.5679
MSE (overall): 94.3286

Per-joint 5% success:
  joint 0: 36.20%
  joint 1: 43.02%
  joint 2: 61.52%
  joint 3: 68.73%
  joint 4: 56.61%
  joint 5: 53.61%

üéØ strict success (all 10 steps, all joints within 5%): 3.12%

Done.


In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# ======================================================
# CONFIG
# ======================================================
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_DIR = Path("/content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1")
BATCH_SIZE = 32
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(f"Loading policy from: {MODEL_DIR}")
policy = SmolVLAPolicy.from_pretrained(MODEL_DIR)
policy.to(DEVICE).eval()

meta = LeRobotDatasetMetadata(DATASET_REPO)
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

fps = meta.fps
chunk_size = policy.config.chunk_size
print("chunk_size:", chunk_size)
print("fps:", fps)

# try to get normalization stats
action_stats = meta.stats["action"]
has_mean_std = ("mean" in action_stats) and ("std" in action_stats)
action_min = torch.tensor(action_stats["min"])
action_max = torch.tensor(action_stats["max"])
if has_mean_std:
    action_mean = torch.tensor(action_stats["mean"])
    action_std  = torch.tensor(action_stats["std"])
    print("using z-score unnormalization (x * std + mean)")
else:
    print("using min/max unnormalization ( (x+1)/2 * (max-min) + min )")


delta_timestamps = {              # build timestamps to match model
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": [i / fps for i in range(chunk_size)], #50 actions
}

# ======================================================
# build val split (episodes >= 40)
# ======================================================
base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_ds = Subset(val_full, val_indices)
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False, num_workers=4, pin_memory=True)
print(f"val samples: {len(val_ds)}")


# helpers
def fix_keys(batch):
    """ remap camers name """
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")
    return batch

def unnormalize_pred(pred_norm: torch.Tensor) -> torch.Tensor:
    """pred_normalised (B,6) from model"""

    return pred_norm * action_std + action_mean


# EVAL
all_preds_raw = []
all_gts_raw = []

with torch.no_grad():
    for i, raw in enumerate(val_loader):
        """ ground truth in RAW space (dataset space)"""
        gt_raw = raw["action"][:, 0, :].clone()     # (B,6)

        raw = fix_keys(raw)
        batch = preprocessor(raw)

        for k, v in list(batch.items()):
            if torch.is_tensor(v):
                batch[k] = v.to(DEVICE)

        # model inference to normalised action sequence
        pred_seq = policy.predict_action_chunk(batch)    # (B, 50, 6)
        pred_step0 = pred_seq[:, 0, :].cpu()             # (B,6) normalized

        # bring prediction back to RAW space
        pred_raw = unnormalize_pred(pred_step0)          # (B,6)
        # append the preds and gts to lists
        all_preds_raw.append(pred_raw)
        all_gts_raw.append(gt_raw)


all_preds_raw = torch.cat(all_preds_raw, dim=0)
all_gts_raw   = torch.cat(all_gts_raw, dim=0)

print("Collected preds:", all_preds_raw.shape, "Ground truths:", all_gts_raw.shape)

# ======================================================
# METRICS: per-joint 5% of its own range
# ======================================================
joint_ranges = action_max - action_min
tol = joint_ranges * 0.05     # 5% per joint

abs_err = torch.abs(all_preds_raw - all_gts_raw)   # (N,6)
within_5pr = abs_err <= tol                       # (True/False split)

per_joint_success = within_5pr.float().mean(dim=0) * 100.0
overall_mean = per_joint_success.mean().item()

print("\n========== EVAL (per-joint 5%) ==========")
for j, s in enumerate(per_joint_success):
    print(f"joint {j}: {s:.2f}%")

print(f"\nAverage per-joint success (5%): {overall_mean:.2f}%")
print("=========================================\n")


Loading policy from: /content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.03G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/136 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/67.0 [00:00<?, ?B/s]

chat_template.json:   0%|          | 0.00/430 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/868 [00:00<?, ?B/s]

Reducing the number of VLM layers to 16 ...
Loading weights from local directory


Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

stats.json: 0.00B [00:00, ?B/s]

info.json: 0.00B [00:00, ?B/s]

meta/episodes/chunk-000/file-000.parquet:   0%|          | 0.00/72.6k [00:00<?, ?B/s]

meta/tasks.parquet:   0%|          | 0.00/2.25k [00:00<?, ?B/s]

chunk_size (action horizon): 50
fps: 30
using z-score unnormalization (x * std + mean)


Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

.gitattributes: 0.00B [00:00, ?B/s]

README.md: 0.00B [00:00, ?B/s]

data/chunk-000/file-000.parquet:   0%|          | 0.00/370k [00:00<?, ?B/s]

videos/observation.images.side/chunk-000(‚Ä¶):   0%|          | 0.00/45.5M [00:00<?, ?B/s]

videos/observation.images.up/chunk-000/f(‚Ä¶):   0%|          | 0.00/40.1M [00:00<?, ?B/s]

val samples: 2759




Collected preds: torch.Size([2759, 6]) Ground truths: torch.Size([2759, 6])

joint 0: 45.20%
joint 1: 48.21%
joint 2: 69.52%
joint 3: 77.56%
joint 4: 61.00%
joint 5: 63.39%

Average per-joint success (5%): 60.81%



In [None]:
print("Using delta_timestamps:", delta_timestamps)


Using delta_timestamps: {'observation.state': [0.0], 'observation.images.up': [0.0], 'observation.images.side': [0.0], 'action': [0.0, 0.03333333333333333, 0.06666666666666667, 0.1, 0.13333333333333333, 0.16666666666666666, 0.2, 0.23333333333333334, 0.26666666666666666, 0.3, 0.3333333333333333, 0.36666666666666664, 0.4, 0.43333333333333335, 0.4666666666666667, 0.5, 0.5333333333333333, 0.5666666666666667, 0.6, 0.6333333333333333, 0.6666666666666666, 0.7, 0.7333333333333333, 0.7666666666666667, 0.8, 0.8333333333333334, 0.8666666666666667, 0.9, 0.9333333333333333, 0.9666666666666667, 1.0, 1.0333333333333334, 1.0666666666666667, 1.1, 1.1333333333333333, 1.1666666666666667, 1.2, 1.2333333333333334, 1.2666666666666666, 1.3, 1.3333333333333333, 1.3666666666666667, 1.4, 1.4333333333333333, 1.4666666666666666, 1.5, 1.5333333333333334, 1.5666666666666667, 1.6, 1.6333333333333333]}


In [None]:
from pathlib import Path
import torch
from torch.utils.data import DataLoader, Subset
import numpy as np

# --- lerobot imports ---
from lerobot.datasets.lerobot_dataset import LeRobotDataset, LeRobotDatasetMetadata
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

# ======================================================
# CONFIG
# ======================================================
DATASET_REPO = "lerobot/svla_so101_pickplace"
MODEL_DIR = Path("/content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1")
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

BATCH_SIZE = 32
K_STEPS = 50        # how many steps from the predicted sequence we evaluate (<= chunk_size)
VAL_EPISODE_MIN = 40   # val = episodes >= 40
SUCCESS_PCT = 0.05     # 5% per-joint

print(f"Loading policy from: {MODEL_DIR}")

# ======================================================
# 1) load policy + metadata + preprocessor
# ======================================================
policy = SmolVLAPolicy.from_pretrained(MODEL_DIR)
policy.to(DEVICE).eval()

meta = LeRobotDatasetMetadata(DATASET_REPO)
preprocessor, _ = make_pre_post_processors(policy.config, dataset_stats=meta.stats)

chunk_size = policy.config.chunk_size
fps = meta.fps
print("chunk_size (action horizon):", chunk_size)
print("fps:", fps)

# we know this dataset exposes action stats with mean/std (z-score)
action_stats = meta.stats["action"]
action_mean = torch.tensor(action_stats["mean"]).float()
action_std = torch.tensor(action_stats["std"]).float()

# we also keep min/max in case we want a 5%-of-range threshold
action_min = torch.tensor(action_stats["min"]).float()
action_max = torch.tensor(action_stats["max"]).float()

# ======================================================
# 2) rebuild validation dataset exactly like training
# ======================================================
print("val split: episodes >=", VAL_EPISODE_MIN)

# first load base to read episode_index
base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= VAL_EPISODE_MIN]
print("val samples:", len(val_indices))

# build the real val dataset with correct timestamps
delta_timestamps = {
    "observation.state": [0.0],
    "observation.images.up": [0.0],
    "observation.images.side": [0.0],
    "action": [i / fps for i in range(chunk_size)],
}

val_full = LeRobotDataset(
    DATASET_REPO,
    delta_timestamps=delta_timestamps,
    video_backend="pyav",
)
val_ds = Subset(val_full, val_indices)
val_loader = DataLoader(
    val_ds,
    batch_size=BATCH_SIZE,
    shuffle=False,
    pin_memory=True,
    num_workers=4,
)

# ======================================================
# 3) helper fns (same as training)
# ======================================================
def fix_keys(batch: dict) -> dict:
    # rename camera keys and pad missing
    if "observation.images.up" in batch:
        batch["observation.images.camera1"] = batch.pop("observation.images.up")
    if "observation.images.side" in batch:
        batch["observation.images.camera2"] = batch.pop("observation.images.side")

    if "observation.images.camera1" in batch and "observation.images.camera2" not in batch:
        batch["observation.images.camera2"] = torch.zeros_like(batch["observation.images.camera1"])
    if "observation.images.camera2" in batch and "observation.images.camera1" not in batch:
        batch["observation.images.camera1"] = torch.zeros_like(batch["observation.images.camera2"])

    # image may come as (B, T, C, H, W); we only want T=1
    for cam in ["observation.images.camera1", "observation.images.camera2"]:
        if cam in batch:
            img = batch[cam]
            if img.ndim == 5:
                batch[cam] = img[:, 0]   # (B, C, H, W)
    return batch


def ensure_task(batch: dict) -> dict:
    if "task" not in batch:
        # infer B
        B = None
        for v in batch.values():
            if torch.is_tensor(v):
                B = v.shape[0]
                break
        if B is None:
            B = 1
        batch["task"] = ["Pick and place the object."] * B
    return batch


def unnorm_from_zscore(actions_norm: torch.Tensor) -> torch.Tensor:
    """
    model ‚Üí preprocessor used z-score: x = (raw - mean)/std
    so raw = x * std + mean
    actions_norm: (..., 6)
    """
    return actions_norm * action_std + action_mean


# build 5% thresholds per joint from range
joint_ranges = action_max - action_min         # (6,)
joint_thresholds = joint_ranges * SUCCESS_PCT  # (6,)


# ======================================================
# 4) evaluation loop
# ======================================================
print("\nStarting offline eval (z-score unnorm, 5% per-joint)...\n")

all_preds = []
all_gts = []

with torch.no_grad():
    for i, raw in enumerate(val_loader):

        # GT is already in dataset space: (B, chunk_size, 6)
        gt_seq = raw["action"][:, :K_STEPS, :].clone()   # (B, K, 6)

        # preprocess to match model
        raw = fix_keys(raw)
        raw = ensure_task(raw)
        batch = preprocessor(raw)

        for k, v in list(batch.items()):
            if torch.is_tensor(v):
                batch[k] = v.to(DEVICE)

        # model predicts a whole chunk in normalized space
        pred_seq_norm = policy.predict_action_chunk(batch)   # (B, chunk_size, 6)

        # keep same K as GT
        pred_seq_norm = pred_seq_norm[:, :K_STEPS, :]        # (B, K, 6)

        # unnormalize back to dataset space
        pred_seq = unnorm_from_zscore(pred_seq_norm.cpu())   # (B, K, 6)

        all_preds.append(pred_seq)
        all_gts.append(gt_seq)

        if (i + 1) % 20 == 0:
            print(f"  processed {i+1}/{len(val_loader)} batches")

# concat
all_preds = torch.cat(all_preds, dim=0)   # (N, K, 6)
all_gts   = torch.cat(all_gts, dim=0)     # (N, K, 6)

print("\nCollected preds:", all_preds.shape, "GT:", all_gts.shape)

# ======================================================
# 5) metrics
# ======================================================
# abs error per sample, per step, per joint
abs_err = (all_preds - all_gts).abs()      # (N, K, 6)

# per-joint success: within 5% on average across all steps and samples
per_joint_success = []
for j in range(abs_err.shape[-1]):
    # threshold for this joint, shape (1,1)
    thr = joint_thresholds[j]
    ok = (abs_err[:, :, j] <= thr).float().mean().item() * 100.0
    per_joint_success.append(ok)

print("\n========== EVAL (per-joint 5%) ==========")
for j, s in enumerate(per_joint_success):
    print(f"joint {j}: {s:.2f}%")
print(f"\nAverage per-joint success (5%): {sum(per_joint_success)/len(per_joint_success):.2f}%")
print("=========================================\n")

# (optional) strict: ALL steps and ALL joints within their own 5%
strict_ok = (abs_err <= joint_thresholds.view(1, 1, -1)).all(dim=2).all(dim=1).float().mean() * 100.0
print(f"Strict (all {K_STEPS} steps, all 6 joints within 5%): {strict_ok:.2f}%\n")

# (optional) step-0 metric
abs_err_step0 = abs_err[:, 0, :]  # (N, 6)
step0_per_joint = []
for j in range(abs_err_step0.shape[-1]):
    thr = joint_thresholds[j]
    ok = (abs_err_step0[:, j] <= thr).float().mean().item() * 100.0
    step0_per_joint.append(ok)

print("Step-0 per-joint success (5%):")
for j, s in enumerate(step0_per_joint):
    print(f"  joint {j}: {s:.2f}%")


Loading policy from: /content/drive/MyDrive/ELM/lerobot_outputs/smolvla_proper_split_final/best_model1
Loading  HuggingFaceTB/SmolVLM2-500M-Video-Instruct weights ...
Reducing the number of VLM layers to 16 ...
Loading weights from local directory
chunk_size (action horizon): 50
fps: 30
val split: episodes >= 40
val samples: 2759

Starting offline eval (z-score unnorm, 5% per-joint)...





  processed 20/87 batches
  processed 40/87 batches
  processed 60/87 batches
  processed 80/87 batches

Collected preds: torch.Size([2759, 50, 6]) GT: torch.Size([2759, 50, 6])

joint 0: 32.89%
joint 1: 31.82%
joint 2: 48.65%
joint 3: 51.74%
joint 4: 42.42%
joint 5: 34.46%

Average per-joint success (5%): 40.33%

Strict (all 50 steps, all 6 joints within 5%): 0.00%

Step-0 per-joint success (5%):
  joint 0: 44.65%
  joint 1: 47.26%
  joint 2: 69.30%
  joint 3: 78.00%
  joint 4: 61.07%
  joint 5: 64.23%


In [None]:
import torch

# all_preds: (N, K, 6)
# all_gts:   (N, K, 6)
# joint_thresholds: (6,)

assert all_preds.shape == all_gts.shape
N, K, D = all_preds.shape  # D = 6 joints

# 1) absolute error over everything
abs_err = (all_preds - all_gts).abs()   # (N, K, 6)

# 2) per-joint MAE averaged over ALL K steps and ALL samples
per_joint_mae = abs_err.mean(dim=(0, 1))   # (6,)
print("\nPer-joint MAE over ALL K steps:")
for j in range(D):
    print(f"  joint {j}: {per_joint_mae[j]:.4f}")

# 3) per-joint success over ALL K steps:
#    ‚Äúhow often was this joint within its 5% threshold at that step?‚Äù
per_joint_success = []
for j in range(D):
    thr = joint_thresholds[j]                    # scalar
    ok = (abs_err[:, :, j] <= thr).float().mean()  # average over N and K
    per_joint_success.append(ok.item() * 100.0)

print("\nPer-joint success (5% of own range), averaged over ALL K steps:")
for j, s in enumerate(per_joint_success):
    print(f"  joint {j}: {s:.2f}%")

# 4) overall average across joints (this is the single number you can report)
avg_success = sum(per_joint_success) / len(per_joint_success)
print(f"\nAverage per-joint success across all K steps: {avg_success:.2f}%")



Per-joint MAE over ALL K steps:
  joint 0: 23.7162
  joint 1: 18.6335
  joint 2: 9.4588
  joint 3: 5.7198
  joint 4: 10.0489
  joint 5: 5.0625

Per-joint success (5% of own range), averaged over ALL K steps:
  joint 0: 32.89%
  joint 1: 31.82%
  joint 2: 48.65%
  joint 3: 51.74%
  joint 4: 42.42%
  joint 5: 34.46%

Average per-joint success across all K steps: 40.33%


In [None]:
print(within_5pct)

tensor([[ True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True],
        ...,
        [ True, False,  True, False,  True, False],
        [ True, False,  True, False,  True, False],
        [ True, False,  True,  True,  True, False]])


In [None]:
base_ds = LeRobotDataset(DATASET_REPO, video_backend="pyav")
episode_idx = np.array(base_ds.hf_dataset["episode_index"])
val_indices = [i for i, ep in enumerate(episode_idx) if ep >= 40]

print(base_ds)
print(episode_idx)
print(val_indices)

LeRobotDataset({
    Repository ID: 'lerobot/svla_so101_pickplace',
    Number of selected episodes: '50',
    Number of selected samples: '11939',
    Features: '['action', 'observation.state', 'observation.images.up', 'observation.images.side', 'timestamp', 'frame_index', 'episode_index', 'index', 'task_index']',
})',

[ 0  0  0 ... 49 49 49]
[9180, 9181, 9182, 9183, 9184, 9185, 9186, 9187, 9188, 9189, 9190, 9191, 9192, 9193, 9194, 9195, 9196, 9197, 9198, 9199, 9200, 9201, 9202, 9203, 9204, 9205, 9206, 9207, 9208, 9209, 9210, 9211, 9212, 9213, 9214, 9215, 9216, 9217, 9218, 9219, 9220, 9221, 9222, 9223, 9224, 9225, 9226, 9227, 9228, 9229, 9230, 9231, 9232, 9233, 9234, 9235, 9236, 9237, 9238, 9239, 9240, 9241, 9242, 9243, 9244, 9245, 9246, 9247, 9248, 9249, 9250, 9251, 9252, 9253, 9254, 9255, 9256, 9257, 9258, 9259, 9260, 9261, 9262, 9263, 9264, 9265, 9266, 9267, 9268, 9269, 9270, 9271, 9272, 9273, 9274, 9275, 9276, 9277, 9278, 9279, 9280, 9281, 9282, 9283, 9284, 9285, 9286, 9287, 9288

In [None]:
print(abs_err)

tensor([[3.1653e+00, 1.5401e+00, 3.6829e+00, 1.5907e+00, 1.8036e-02, 1.2496e+00],
        [7.7571e+00, 8.4783e-01, 4.2213e+00, 5.2963e-01, 5.6801e-01, 7.7189e-01],
        [7.6921e+00, 1.6992e+00, 3.3586e+00, 3.8784e-01, 2.0568e+00, 1.2114e+00],
        ...,
        [5.9817e+00, 1.7380e+01, 4.2885e+00, 3.8774e+00, 1.1286e-01, 1.8689e+00],
        [8.1969e+00, 2.1706e+01, 2.9123e+00, 6.2907e+00, 5.2727e-01, 1.7922e+00],
        [8.2440e+00, 2.0839e+01, 1.5118e+00, 2.0707e+00, 3.2801e-01, 1.9056e+00]],
       dtype=torch.float64)


In [None]:
print(batch)

{'action': tensor([[[-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         ...,
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721]],

        [[-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         ...,
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721]],

        [[-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -0.2357, -0.5721],
         [-0.0711, -0.6231,  1.1976, -0.5100, -

In [None]:
print(fix_keys(raw))

{'action': tensor([[[  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         ...,
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002]],

        [[  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         ...,
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002]],

        [[  4.8529, -78.6947, 100.0000,  62.4309, -57.6068,   1.7002],
         [  4.8529, -78.6947, 100.

In [None]:
print(joint_ranges)

tensor([181.4706, 108.1263,  87.0278,  65.9584,  72.7717,  32.9985],
       dtype=torch.float64)


In [None]:
print(tol)

tensor([9.0735, 5.4063, 4.3514, 3.2979, 3.6386, 1.6499], dtype=torch.float64)
