Skip to content

feat: dataset versioning#1837

Merged
max-braintrust merged 17 commits intomainfrom
dataset-tags-refactor
Apr 23, 2026
Merged

feat: dataset versioning#1837
max-braintrust merged 17 commits intomainfrom
dataset-tags-refactor

Conversation

@max-braintrust
Copy link
Copy Markdown
Contributor

@max-braintrust max-braintrust commented Apr 15, 2026

Summary

This PR adds dataset snapshot and environment tag support to the JS SDK. See feature spec here: braintrustdata/braintrust-spec#14

Background

This change adds two friendlier ways to reference dataset versions:

  • Snapshots, which are stable human-readable names for a specific dataset version
  • Environment tags, which are movable aliases like ppe or production that can be repointed over time

These are still just ways of referring to a concrete dataset version (xact_id). The SDK resolves snapshot names and environment tags down to the underlying xact_id before experiment or eval registration, so we keep the existing reproducibility guarantees while making version selection much easier to use.

This PR adds:

  • SDK support for initializing datasets by:
    • explicit version (xact_id)
    • snapshot name
    • environment tag
  • Resolution of snapshot and environment selectors to a concrete dataset version internally before eval / experiment registration
  • SDK helpers for dataset snapshots, including:
    • create
    • list
    • update via register/upsert for the current dataset version
    • patch snapshot metadata by id
    • delete
    • restore and restore/preview to return the dataset head to the state at a particular version
  • Dev server support for forwarding dataset version and environment when resolving datasets for remote evals
  • Tests and example coverage for the new version-selection paths

@max-braintrust max-braintrust marked this pull request as ready for review April 20, 2026 20:46
@max-braintrust max-braintrust changed the title Support dataset versioning feat: dataset versioning Apr 21, 2026
const newNormalized = normalizeClass(newClass);

// Check if normalized versions are similar (one contains significant portion of the other)
const similarityThreshold = Math.min(500, oldNormalized.length * 0.5);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heuristic fails if you are adding substantially to the method bodies of a class - added validation through the TS parser as a fallback when this fails. Since this should be faster, keeping it for the normal case.

Comment thread js/src/logger.ts
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luca Forstner (@lforst) we really need to split this file up 😭

Comment thread js/src/logger.ts
});
args["dataset_id"] = datasetSelection.datasetId;
if (datasetSelection.datasetVersion !== undefined) {
args["dataset_version"] = datasetSelection.datasetVersion;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change to how this worked before because we had the } else { branch that would do args["dataset_version"] = await (dataset as AnyDataset).version();. Is that intentional? I guess we do save on having to do the dataset.version() call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that should probably get fixed so this still works for subclasses/cases where the version is pinned manually - updated serializeDatasetForExperiment() so it will always hit .version() if we don't resolve through one of the other selections.

Comment thread js/src/logger.ts
Comment thread js/src/logger.ts Outdated
Comment thread js/src/logger.ts Outdated
Comment on lines +3879 to +3880
const snapshots = await getDatasetSnapshots({ state, datasetId });
const match = snapshots.find((snapshot) => snapshot.name === snapshotName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a backend endpoint to do this instead? Feels like a lot to scan through all the datasets client-side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should already be supported - pulled apart listSnapshots() and getSnapshot() so this makes use of that properly now.

Comment thread js/src/logger.ts
dataset_id: string;
dataset_version?: string;
dataset_environment?: string;
dataset_snapshot_name?: string;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to pass dataset_snapshot_name into the remote evals created in

async function getDataset(

(I lack a lot of context w/ remote evals so lmk if I'm off the mark!)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed a bit offline - the remote eval path needs more api changes before it gets added in the sdk.

Comment thread js/src/logger.ts Outdated
xactId: string;
};

type DatasetSnapshotLookup =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we export this? it's used by public async getSnapshot

Comment thread .changeset/goofy-hotels-care.md Outdated
---

- (feat) Add dataset snapshot/environment selection support to `init()` and `initDataset()`, including snapshot CRUD helpers and `DatasetSnapshot` type exports.
- (feat) Update `braintrust/dev` to respect `dataset_version` and `dataset_environment` when resolving datasets for evals.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to also add a little extra detail here, like an example code snippet!

@max-braintrust max-braintrust merged commit 3500ec2 into main Apr 23, 2026
52 of 54 checks passed
@max-braintrust max-braintrust deleted the dataset-tags-refactor branch April 23, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants