chore: Prepare for release (0a2e9115c308)#1940
Merged
Merged
Conversation
Synchronizes the main branch with the release branch. (changed files should generally only be package versions, changeset files, and changelogs) --------- Co-authored-by: Luca Forstner <luca.forstner@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Moves us away from discontinued models for testing.
We were ignoring streamed tool calls. Fixes #1846
Now looks like this <img width="288" height="301" alt="Screenshot 2026-04-22 at 13 32 24" src="https://github.com/user-attachments/assets/26f96a2c-659f-4a42-8429-f91fb034a22f" />
### Summary This PR adds dataset snapshot and environment tag support to the JS SDK. See feature spec here: braintrustdata/braintrust-spec#14 ### Background This change adds two friendlier ways to reference dataset versions: - Snapshots, which are stable human-readable names for a specific dataset version - Environment tags, which are movable aliases like ppe or production that can be repointed over time These are still just ways of referring to a concrete dataset version (xact_id). The SDK resolves snapshot names and environment tags down to the underlying xact_id before experiment or eval registration, so we keep the existing reproducibility guarantees while making version selection much easier to use. ### This PR adds: - SDK support for initializing datasets by: - explicit version (xact_id) - snapshot name - environment tag - Resolution of snapshot and environment selectors to a concrete dataset version internally before eval / experiment registration - SDK helpers for dataset snapshots, including: - create - list - update via register/upsert for the current dataset version - patch snapshot metadata by id - delete - restore and restore/preview to return the dataset head to the state at a particular version - Dev server support for forwarding dataset version and environment when resolving datasets for remote evals - Tests and example coverage for the new version-selection paths
We should also wipe `util/dist`
Stores _internal_btql filters for experiment datasets in the experiment metadata. Right now we don’t persist those filter options, so we lose the ability to reconstruct the exact subset of rows an experiment ran against. If we save them with the experiment, we can recreate the same row set later instead of having to guess. This unlocks a few useful things: - Re-running an experiment on the exact same data it originally saw. - Showing the BTQL filter used by an experiment in the Braintrust UI. - Anything else that depends on reconstructing the precise rows that were initially fed into an experiment.
It being a prerelease actually tripped up our release process because we didn't define a tag. In general we should probably not have rc versions in the package jsons too.
Automated regeneration of SDK types. Co-authored-by: braintrust-bot[bot] <215900051+braintrust-bot[bot]@users.noreply.github.com>
It's too inconsistent.
`..eval.js` -> `.eval.js`
for some reason the model started respond weirdly
I don't like that vitest is spamming ci summaries
Fixes #1919 Can probably still be improved but a first iteration. <img width="288" height="295" alt="Screenshot 2026-05-04 at 10 58 19" src="https://github.com/user-attachments/assets/32321a58-807a-4512-aa91-ae9b519136a4" />
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Prepares a release by updating changelogs and package versions, and synchronizing everything to the release branch.