v0.0.2

Latest

Latest

github-actions released this 08 May 19:09

· 40 commits to main since this release

9141dc1

Version 0.0.2 -- Released 2026-05-08

Added

Added per-instance predictor base class (InstancePredictor) and random example
User can now specify patterns to helm runs, suites, or all outputs as predictor input
Added symbol sweeping capability to evaluation card evaluator
Added modal CLI for evaluation.py script
Added support for KWDagger pipelines in evaluation cards (both as explicit pipelines, and YAML defined pipelines)
Added support for symbol overrides to magnet evaluate with the --override argument
Added parallelization to magnet evaluate with the --jobs (and --parallel_backend) arguments
Added claim resolution and final result file output to magnet evaluate
Added support for claim_aggregation_strategy to evaluation cards (supporting any, all, and fraction strategies)

Changed

Switched to single argument path input for example predictors
Cleaned up predicted vs. actual code for predictors
HelmRuns.coerce can now accept a more expressive set of inputs
BREAKING: You must how specify helm_runs when calling the predictor.
magnet download helm can now download multiple benchmarks

Fixed

Fixed doctests and README wrt predictor refactors
Updated predict_inputs_exploration.ipynb notebook wrt API updates

Assets 4