feat(scripts): annotate_subtasks.py — VLM subtask labelling for dataset mixtures#215
Conversation
Adds a new script that samples 1 fps frames from episode videos, sends them to claude-opus-4-7, and writes per-episode subtask boundary JSONs compatible with add_subtask_response.py. Hub-only datasets (no root) are downloaded via snapshot_download before processing. Includes a public example config at example/train_mixture_config.json. Adds anthropic>=0.55.0 as a project dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ted kwarg Replaces the placeholder local/example dataset with the real public TensorAuto/IceLemonade_100 Hub dataset and removes the fake lerobot/pusht entry. Also drops the deprecated local_dir_use_symlinks=False kwarg from snapshot_download (huggingface_hub ≥0.24 no longer needs it). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[claude-review] summary for commit 915af4a Latest commit (915af4a) addresses four of the prior findings:
Note: the new Note: PR description's sample output for |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ust parsing - Skip parquet update when 'response' column already exists (metadata-only check), so reruns are actually O(1) per episode instead of re-reading and re-writing every parquet. - Thread --sample-fps into both system and user prompts (was hardcoded to "1 fps") so Claude isn't misled when a non-default rate is used. - Pick the first text block from response.content instead of [0].text; validate parsed subtask entries have time+subtask before use.
There was a problem hiding this comment.
Inline findings on annotate_subtasks.py — only the still-applicable issues from the previous review are reposted; the four fixes in 915af4a (text-block iteration, valid-entry filtering, fps-templated prompts, parquet-skip on rerun) clear the rest.
|
@WilliamYue37 Can we also take care of rate limiters. So, the script doesn't crash because of it |
|
@claude fix |
- addresses @claude[bot] (frame cap): add MAX_FRAMES_PER_REQUEST=100 with --max-frames CLI flag; long clips are uniformly subsampled instead of silently exceeding the Anthropic Messages 100-image limit. Comment why the stride floor is needed. - addresses @claude[bot] (parquet length): trust episodes.jsonl length in _update_parquet_response, warn + pad/truncate on parquet row mismatch (mirrors add_subtask_response.py:156-167). - addresses @claude[bot] (--help): short description= + epilog=__doc__ so the flag list is no longer buried under the module docstring. - addresses @claude[bot] (silent rerun skip): bump 'response column already present' log to INFO with delete-to-regenerate hint; document the same in the module docstring. - addresses @claude[bot] (v2.1 expectation): warn when info.codebase_version is not 'v2.1' and document the limitation in the module docstring. - addresses @claude[bot] (no tests): add tests/scripts/test_annotate_subtasks.py covering _parse_json_response (fence stripping, non-array rejection) and _coerce_subtasks (entry filtering, time=0.0 backfill, empty rejection). Extracted _coerce_subtasks helper to make filtering testable. - addresses @akshay18iitg (rate limits): pass max_retries=--max-api-retries (default 8) to the Anthropic client so 429/5xx responses retry with the SDK's built-in exponential backoff instead of crashing the run. tests: passed -- pytest -m "not gpu" -n auto tests/scripts/test_annotate_subtasks.py tests/scripts/test_add_subtask_response.py Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[claude-fix] @akshay18iitg done in 58bc039. The Anthropic SDK has built-in retry-with-backoff for |
What this does
Adds
src/opentau/scripts/annotate_subtasks.py, a new offline annotation script that automatically labels every episode in a dataset mixture with subtask boundaries usingclaude-opus-4-7.How it works (efficiently):
--sample-fps[{"time": float, "subtask": str}, ...]boundariesHub dataset support: datasets without a local
rootare downloaded viahuggingface_hub.snapshot_downloadinto~/.cache/huggingface/opentau_subtasks/before processing.Output is written as per-episode JSONs compatible with the existing
add_subtask_response.py, and optionally expanded into aresponsecolumn in each episode parquet (--write-response-column, on by default).Adds
anthropic>=0.55.0as a project dependency. Addsconfigs/examples/train_mixture_config.jsonas a public example config pointing atlerobot/droid_100(pinned tov2.1). Adds documentation in the Datasets tutorial.How it was tested
Ran against
lerobot/droid_100atrevision=v2.1(Hub download path) and the localshuheng_bottle_liftdataset (local path):# Hub dataset — downloads, annotates 1 episode, checks subtask JSON python src/opentau/scripts/annotate_subtasks.py \ --config-path configs/examples/train_mixture_config.json \ --max-episodes-per-dataset 1 \ --no-write-response-columnSample output for
lerobot/droid_100episode 0 (task: "Put the marker in the pot"):[ {"time": 0.0, "subtask": "approaching the marker on the table"}, {"time": 4.0, "subtask": "grasping the marker"}, {"time": 6.0, "subtask": "lifting and moving marker toward pot"}, {"time": 8.0, "subtask": "placing marker into the pot"}, {"time": 10.0, "subtask": "retracting arm away from pot"} ]Also verified:
responsecolumn added correctly,meta/info.jsonupdated withsubtask_pathandresponsefeatureHow to checkout & try? (for the reviewer)
Checklist
Note: Before submitting this PR, please read the contributor guideline.