Skip to content

Nvcf data collection#20

Merged
jaehunjung1 merged 4 commits intoNVIDIA-NeMo:jh/kimifrom
bcui-nvda:nvcf-data-collection
Mar 18, 2026
Merged

Nvcf data collection#20
jaehunjung1 merged 4 commits intoNVIDIA-NeMo:jh/kimifrom
bcui-nvda:nvcf-data-collection

Conversation

@bcui-nvda
Copy link
Copy Markdown

End-user friendly description of the problem this fixes or functionality this introduces.


Summarize what the PR does, explaining any non-trivial design decisions.


Link of any specific issues this addresses:

Brandon Cui and others added 4 commits March 10, 2026 14:45
Cherry-picked from bcui/init_nvcf_data:
- debug_env_controller.py: rewritten to use OSWorld DesktopEnv
- module_data_collector.py: DesktopEnv integration + NVCF pre-download
- parallel_collect_trajectories.py: runtime->env, NVCF support
- cleanup_nvcf.py: utility to list/cleanup NVCF functions
- Rewrite env controllers to support both singularity and NVCF backends
  via duck-type dispatch (_is_desktop_env)
- Add --runtime flag to parallel_collect_kimi.py
- Add NVCF launcher scripts (run_parallel_kimi_nvcf.sh, run_collector_kimi_nvcf.sh)
- Add collect_trajectories_sbatch.sh from bcui/init_nvcf_data
- Set OSWORLD_SETUP_CACHE_DIR and NVCF_FUNCTION_NAME_PREFIX env vars
- Install all required Python deps (openai, Pillow, jsonlines, pandas, requests)
  with --break-system-packages for PEP 668 containers
- Remove ipdb import from data_collector.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jaehunjung1 jaehunjung1 marked this pull request as ready for review March 18, 2026 00:27
@jaehunjung1 jaehunjung1 merged commit 49ea590 into NVIDIA-NeMo:jh/kimi Mar 18, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants