Skip to content

Push Data Designer datasets to Harbor Hub #619

@kirit93

Description

@kirit93

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

Add first-class support for exporting and publishing NeMo Data Designer outputs as Harbor datasets on Harbor Hub.

Harbor Hub supports public and private datasets for training and evaluation. When users need a custom benchmark, NeMo Data Designer is a natural place to generate, validate, filter, and score that data. The missing piece is a simple, reliable path from a Data Designer-generated dataset to a published Harbor Hub dataset that can be run with Harbor evals.

Problem

Today, users can run public or private Harbor benchmarks from Harbor Hub, but if they generate a custom benchmark with NeMo Data Designer, they still need to manually convert the output into Harbor’s dataset/task structure.

That conversion is non-trivial. A Harbor dataset is a collection of Harbor tasks. A typical runnable Harbor task includes:

  • task.toml
  • instruction.md
  • tests/test.sh
  • an environment/ definition, such as environment/Dockerfile for Docker-based tasks
  • optional solution/solve.sh for oracle validation

Describe the solution you'd like

Add a Harbor Hub export/publish integration to NeMo Data Designer.

At a high level, Data Designer should support two related operations:

  1. Export a generated dataset into a local Harbor-compatible dataset directory.
  2. Publish that dataset to Harbor Hub using Harbor’s existing publishing workflow.

Describe alternatives you've considered

No response

Agent Investigation

No response

Additional context

No response

Checklist

  • I've reviewed existing issues and the documentation
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions