Skip to content

fix: upsert dataset_feature_name_mapping if specified with dataset#139

Merged
shuheng-liu merged 2 commits into
mainfrom
fix/custom-dataset-feature-map
Mar 12, 2026
Merged

fix: upsert dataset_feature_name_mapping if specified with dataset#139
shuheng-liu merged 2 commits into
mainfrom
fix/custom-dataset-feature-map

Conversation

@shuheng-liu
Copy link
Copy Markdown
Member

What this does

Fix #138

How it was tested

Removed the following section from standard_data_format_mapping.py

"physical-intelligence/libero": {
    "camera0": "image",
    "camera1": "wrist_image",
    "state": "state",
    "actions": "actions",
    "prompt": "task",
    "response": "response",
},

and added the following to pi05_training_config.json

"data_features_name_mapping": {
    "camera0": "image",
    "camera1": "wrist_image",
    "state": "state",
    "actions": "actions",
    "prompt": "task",
    "response": "response"
}

and trained with

accelerate launch src/opentau/scripts/train.py --config_path=configs/examples/pi05_training_config.json

How to checkout & try? (for the reviewer)

Remove the following section from standard_data_format_mapping.py

"physical-intelligence/libero": {
    "camera0": "image",
    "camera1": "wrist_image",
    "state": "state",
    "actions": "actions",
    "prompt": "task",
    "response": "response",
},

and add the following to pi05_training_config.json

"data_features_name_mapping": {
    "camera0": "image",
    "camera1": "wrist_image",
    "state": "state",
    "actions": "actions",
    "prompt": "task",
    "response": "response"
}

and train with

accelerate launch src/opentau/scripts/train.py --config_path=configs/examples/pi05_training_config.json

Checklist

  • I have added Google-style docstrings to important functions and ensured function parameters are typed.
  • My PR includes policy-related changes.
    • If the above is checked: I have run the GPU pytests (pytest -m "gpu") and regression tests.

Note: Before submitting this PR, please read the contributor guideline.

Copilot AI review requested due to automatic review settings March 12, 2026 03:18
@shuheng-liu shuheng-liu self-assigned this Mar 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures custom data_features_name_mapping overrides are available when datasets are loaded via PyTorch DataLoader workers (which run in separate processes under spawn), so feature standardization works consistently across parent and worker processes.

Changes:

  • Register DatasetConfig.data_features_name_mapping into the global DATA_FEATURES_NAME_MAPPING during config initialization.
  • Add a worker_init_fn to WeightedDatasetMixture.get_dataloader() to apply mapping overrides inside each worker process.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/opentau/datasets/dataset_mixture.py Adds a worker init hook to upsert mapping overrides in DataLoader worker processes.
src/opentau/configs/default.py Upserts config-provided feature mappings into the global mapping during DatasetConfig.__post_init__.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/opentau/configs/default.py Outdated
Comment thread src/opentau/configs/default.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@shuheng-liu shuheng-liu merged commit 850b182 into main Mar 12, 2026
5 checks passed
@shuheng-liu shuheng-liu deleted the fix/custom-dataset-feature-map branch March 12, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data_features_name_mapping doesn't work

3 participants