Usability improvements: transparent levels, compute(), better errors#834
Usability improvements: transparent levels, compute(), better errors#834
Conversation
…errors, BenchRunCfg factories
Address downstream feedback that bencher is unintuitive:
1. Make `level` transparent:
- Extract LEVEL_SAMPLES constant with documented mapping
- Add BenchRunCfg.level_to_samples() for programmatic lookup
- Improve level docstring with full mapping table
- Add samples_per_var parameter as direct alternative to level
- Log effective sample counts at INFO level
2. Reduce __call__ boilerplate:
- Add __init_subclass__ auto-wrapping so subclasses can define
compute() instead of the update_params_from_kwargs + super().__call__
sandwich. Classic __call__ pattern still works unchanged.
3. Better error messages for string var refs:
- Wrap dict lookup with helpful KeyError listing available params
- Use difflib.get_close_matches for "Did you mean?" suggestions
4. BenchRunCfg documentation & factories:
- Restructure docstring with grouped parameter sections and examples
- Add for_time_series() and for_ci() factory classmethods
Reviewer's GuideMakes benchmark configuration and usage more transparent by centralizing level→sample mapping, adding an explicit samples_per_var override and convenience factory methods on BenchRunCfg, improving documentation, logging and error messages, and tightening tests around these behaviors. Sequence diagram for plot_sweep applying samples_per_var and levelsequenceDiagram
actor User
participant PlotAPI as plot_sweep
participant RunCfg as BenchRunCfg
participant Inputs as SweepVariables
User->>PlotAPI: call plot_sweep(input_vars_in, run_cfg)
PlotAPI->>RunCfg: read samples_per_var
alt samples_per_var is not None
PlotAPI->>Inputs: with_samples(run_cfg.samples_per_var) on each variable
PlotAPI-->>User: log info "samples_per_var applied"
else samples_per_var is None
PlotAPI->>RunCfg: read level
alt level > 0
PlotAPI->>Inputs: with_level(run_cfg.level) on each variable
PlotAPI->>RunCfg: BenchRunCfg.level_to_samples(run_cfg.level)
PlotAPI-->>User: log info "level -> samples per variable"
else level == 0
PlotAPI-->>User: use each variable's own samples
end
end
PlotAPI-->>User: continue with sweep execution
Sequence diagram for SweepExecutor parameter lookup with helpful KeyErrorsequenceDiagram
participant Caller
participant Exec as SweepExecutor
participant Worker as ParametrizedSweep
Caller->>Exec: convert_vars_to_params("var_name", Worker, var_type)
Exec->>Exec: _lookup_param_by_name(Worker, "var_name", var_type)
Exec->>Worker: param.objects(instance=False)
Worker-->>Exec: params dict
alt name in params
Exec-->>Exec: return matching param.Parameter
Exec-->>Caller: converted param.Parameter
else name not in params
Exec-->>Exec: build available list and close matches
Exec-->>Caller: raise KeyError with available names and "Did you mean" suggestions
end
Class diagram for updated BenchRunCfg, SweepBase, and SweepExecutor usability featuresclassDiagram
class BenchPlotSrvCfg {
}
class BenchRunCfg {
+int level
+int samples_per_var
+BenchRunCfg deep()
+int level_to_samples(int level, int max_level)
+BenchRunCfg for_time_series(str time_event, **kwargs)
+BenchRunCfg for_ci(str time_event, **kwargs)
+BenchRunCfg with_defaults(BenchRunCfg run_cfg, **defaults)
}
class SweepBase {
+int samples
+SweepBase with_samples(int samples)
+SweepBase with_sample_values(list values)
+tuple~SweepBase, Any~ with_const(Any const_value)
+SweepBase with_level(int level, int max_level)
}
class LEVEL_SAMPLES {
<<constant>>
+list~int~ values
}
class SweepExecutor {
+int cache_size
+FutureCache sample_cache
+SweepExecutor __init__(int cache_size)
+param.Parameter _lookup_param_by_name(ParametrizedSweep worker_class_instance, str name, str var_type)
+param.Parameter convert_vars_to_params(param.Parameter|str|dict|tuple variable, ParametrizedSweep worker_class_instance, str var_type)
}
BenchPlotSrvCfg <|-- BenchRunCfg
LEVEL_SAMPLES --> SweepBase : used for level mapping
BenchRunCfg --> LEVEL_SAMPLES : used by level_to_samples
SweepExecutor --> ParametrizedSweep : operates on
SweepExecutor --> param.Parameter : returns and manipulates
BenchRunCfg ..> SweepExecutor : consumed via run_cfg in execution path
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 950 |
| Total time | 77.00s |
| Mean | 0.0811s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
20.674 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.404 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
3.121 |
test.test_generated_examples::test_generated_example[result_types/result_image/result_image_to_video.py] |
3.103 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
1.850 |
test.test_generated_examples::test_generated_example[1_float/over_time_repeats/sweep_1_float_3_cat_over_time_repeats.py] |
1.172 |
test.test_result_bool.TestVolumeResult::test_volume_3float_multi_repeat |
1.121 |
test.test_generated_examples::test_generated_example[1_float/over_time/sweep_1_float_3_cat_over_time.py] |
0.920 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
0.918 |
test.test_generated_examples::test_generated_example[3_float/over_time/sweep_3_float_2_cat_over_time.py] |
0.874 |
Updated by Performance Tracking workflow
The __init_subclass__/compute() pattern is superseded by the benchmark() method already on ParametrizedSweep, which solves the same boilerplate problem more simply via runtime dispatch.
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The
LEVEL_SAMPLEStable andBenchRunCfg.levelbounds are effectively capped at 12, butLEVEL_SAMPLESdefines a 13th entry andlevel_to_samples()validates againstlen(LEVEL_SAMPLES) - 1, which meanslevel=13is technically accepted there but not elsewhere; consider aligning the array length, validation, and documentation so the supported range is consistent across the codebase. - In
BenchRunCfg.level_to_samples(), the validation and error message are tied tolen(LEVEL_SAMPLES)rather than themax_levelargument, which can be confusing when callers pass a custommax_level; it would be clearer to validate and phrase the error in terms ofmax_levelor to removemax_levelif clamping beyond the table is not required.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `LEVEL_SAMPLES` table and `BenchRunCfg.level` bounds are effectively capped at 12, but `LEVEL_SAMPLES` defines a 13th entry and `level_to_samples()` validates against `len(LEVEL_SAMPLES) - 1`, which means `level=13` is technically accepted there but not elsewhere; consider aligning the array length, validation, and documentation so the supported range is consistent across the codebase.
- In `BenchRunCfg.level_to_samples()`, the validation and error message are tied to `len(LEVEL_SAMPLES)` rather than the `max_level` argument, which can be confusing when callers pass a custom `max_level`; it would be clearer to validate and phrase the error in terms of `max_level` or to remove `max_level` if clamping beyond the table is not required.
## Individual Comments
### Comment 1
<location path="bencher/bench_cfg.py" line_range="387-396" />
<code_context>
return BenchRunCfg(**vars(parser.parse_args()))
+ @staticmethod
+ def level_to_samples(level: int, max_level: int = 12) -> int:
+ """Return the number of samples-per-variable for a given *level*.
+
+ Args:
+ level: Sampling level (1-12).
+ max_level: Cap applied before lookup. Defaults to 12.
+
+ Returns:
+ The sample count for this level.
+
+ Raises:
+ ValueError: If *level* is out of range.
+
+ Example::
+
+ >>> BenchRunCfg.level_to_samples(5)
+ 9
+ """
+ if level < 1 or level >= len(LEVEL_SAMPLES):
+ raise ValueError(f"level must be between 1 and {len(LEVEL_SAMPLES) - 1}, got {level}")
+ return LEVEL_SAMPLES[min(max_level, level)]
</code_context>
<issue_to_address>
**issue:** Guard against `max_level` values below 1 to avoid returning 0 samples.
`level_to_samples` validates `level` but not `max_level`. With `max_level=0`, `min(max_level, level)` is 0, so the function returns `LEVEL_SAMPLES[0] == 0`, conflicting with the documented guarantee of a positive sample count. Please either clamp `max_level` into `[1, len(LEVEL_SAMPLES) - 1]` or raise if `max_level < 1` to avoid this silent misconfiguration.
</issue_to_address>
### Comment 2
<location path="test/test_usability.py" line_range="55-73" />
<code_context>
+ cfg = BenchRunCfg()
+ self.assertIsNone(cfg.samples_per_var)
+
+ def test_samples_per_var_overrides_level(self):
+ """When samples_per_var is set, the bench should use that count regardless of level."""
+ bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, samples_per_var=7))
+ result = bench.plot_sweep()
+ # The sweep should have used 7 samples for theta
+ ds = result.ds
+ self.assertEqual(len(ds.coords["theta"]), 7)
</code_context>
<issue_to_address>
**suggestion (testing):** Also test interaction when both level and samples_per_var are set
This only covers the case where `samples_per_var` is set alone. To cover the documented precedence, please add a test where both `level` and `samples_per_var` are set, e.g.:
```python
cfg = BenchRunCfg(headless=True, level=5, samples_per_var=7)
bench = BenchFloat().to_bench(cfg)
result = bench.plot_sweep()
self.assertEqual(len(result.ds.coords["theta"]), 7)
```
This ensures `samples_per_var` overrides `level` when both are provided.
```suggestion
class TestSamplesPerVar(unittest.TestCase):
def test_default_is_none(self):
cfg = BenchRunCfg()
self.assertIsNone(cfg.samples_per_var)
def test_samples_per_var_overrides_level_when_only_samples_per_var_set(self):
"""When samples_per_var is set, the bench should use that count."""
bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, samples_per_var=7))
result = bench.plot_sweep()
# The sweep should have used 7 samples for theta
ds = result.ds
self.assertEqual(len(ds.coords["theta"]), 7)
def test_samples_per_var_overrides_level_when_both_set(self):
"""When both level and samples_per_var are set, samples_per_var takes precedence."""
cfg = BenchRunCfg(headless=True, level=5, samples_per_var=7)
bench = BenchFloat().to_bench(cfg)
result = bench.plot_sweep()
# Even though level=5, we should still get 7 samples because samples_per_var overrides level.
ds = result.ds
self.assertEqual(len(ds.coords["theta"]), 7)
def test_level_still_works(self):
bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, level=3))
result = bench.plot_sweep()
ds = result.ds
# level 3 → 3 samples
self.assertEqual(len(ds.coords["theta"]), 3)
```
</issue_to_address>
### Comment 3
<location path="test/test_sweep_executor.py" line_range="178-187" />
<code_context>
self.assertEqual(result.name, "theta")
# The parameter should have been processed with level adjustment
+ def test_convert_vars_to_params_bad_string_gives_helpful_error(self):
+ """Test that a typo in a string variable name gives a helpful KeyError."""
+ with self.assertRaises(KeyError) as ctx:
+ self.executor.convert_vars_to_params(
+ "thetaa",
+ "input",
+ None,
+ worker_class_instance=self.worker_instance,
+ worker_input_cfg=ExampleBenchCfg,
+ )
+ msg = str(ctx.exception)
+ self.assertIn("thetaa", msg)
+ self.assertIn("not found", msg)
+ self.assertIn("Available parameters", msg)
+ self.assertIn("theta", msg) # "Did you mean" suggestion
+
+ def test_convert_vars_to_params_bad_dict_name_gives_helpful_error(self):
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for when there are no close matches to avoid brittle assumptions about suggestions
To better exercise `_lookup_param_by_name`, please add a case where there are no close matches (e.g. `"zzzzzz"`). That will verify the error still mentions the missing name and available parameters, and that the "Did you mean" line is omitted (or at least doesn’t surface unrelated suggestions), keeping behavior stable if the suggestion logic or `difflib` tuning changes.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def level_to_samples(level: int, max_level: int = 12) -> int: | ||
| """Return the number of samples-per-variable for a given *level*. | ||
|
|
||
| Args: | ||
| level: Sampling level (1-12). | ||
| max_level: Cap applied before lookup. Defaults to 12. | ||
|
|
||
| Returns: | ||
| The sample count for this level. | ||
|
|
There was a problem hiding this comment.
issue: Guard against max_level values below 1 to avoid returning 0 samples.
level_to_samples validates level but not max_level. With max_level=0, min(max_level, level) is 0, so the function returns LEVEL_SAMPLES[0] == 0, conflicting with the documented guarantee of a positive sample count. Please either clamp max_level into [1, len(LEVEL_SAMPLES) - 1] or raise if max_level < 1 to avoid this silent misconfiguration.
| class TestSamplesPerVar(unittest.TestCase): | ||
| def test_default_is_none(self): | ||
| cfg = BenchRunCfg() | ||
| self.assertIsNone(cfg.samples_per_var) | ||
|
|
||
| def test_samples_per_var_overrides_level(self): | ||
| """When samples_per_var is set, the bench should use that count regardless of level.""" | ||
| bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, samples_per_var=7)) | ||
| result = bench.plot_sweep() | ||
| # The sweep should have used 7 samples for theta | ||
| ds = result.ds | ||
| self.assertEqual(len(ds.coords["theta"]), 7) | ||
|
|
||
| def test_level_still_works(self): | ||
| bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, level=3)) | ||
| result = bench.plot_sweep() | ||
| ds = result.ds | ||
| # level 3 → 3 samples | ||
| self.assertEqual(len(ds.coords["theta"]), 3) |
There was a problem hiding this comment.
suggestion (testing): Also test interaction when both level and samples_per_var are set
This only covers the case where samples_per_var is set alone. To cover the documented precedence, please add a test where both level and samples_per_var are set, e.g.:
cfg = BenchRunCfg(headless=True, level=5, samples_per_var=7)
bench = BenchFloat().to_bench(cfg)
result = bench.plot_sweep()
self.assertEqual(len(result.ds.coords["theta"]), 7)This ensures samples_per_var overrides level when both are provided.
| class TestSamplesPerVar(unittest.TestCase): | |
| def test_default_is_none(self): | |
| cfg = BenchRunCfg() | |
| self.assertIsNone(cfg.samples_per_var) | |
| def test_samples_per_var_overrides_level(self): | |
| """When samples_per_var is set, the bench should use that count regardless of level.""" | |
| bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, samples_per_var=7)) | |
| result = bench.plot_sweep() | |
| # The sweep should have used 7 samples for theta | |
| ds = result.ds | |
| self.assertEqual(len(ds.coords["theta"]), 7) | |
| def test_level_still_works(self): | |
| bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, level=3)) | |
| result = bench.plot_sweep() | |
| ds = result.ds | |
| # level 3 → 3 samples | |
| self.assertEqual(len(ds.coords["theta"]), 3) | |
| class TestSamplesPerVar(unittest.TestCase): | |
| def test_default_is_none(self): | |
| cfg = BenchRunCfg() | |
| self.assertIsNone(cfg.samples_per_var) | |
| def test_samples_per_var_overrides_level_when_only_samples_per_var_set(self): | |
| """When samples_per_var is set, the bench should use that count.""" | |
| bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, samples_per_var=7)) | |
| result = bench.plot_sweep() | |
| # The sweep should have used 7 samples for theta | |
| ds = result.ds | |
| self.assertEqual(len(ds.coords["theta"]), 7) | |
| def test_samples_per_var_overrides_level_when_both_set(self): | |
| """When both level and samples_per_var are set, samples_per_var takes precedence.""" | |
| cfg = BenchRunCfg(headless=True, level=5, samples_per_var=7) | |
| bench = BenchFloat().to_bench(cfg) | |
| result = bench.plot_sweep() | |
| # Even though level=5, we should still get 7 samples because samples_per_var overrides level. | |
| ds = result.ds | |
| self.assertEqual(len(ds.coords["theta"]), 7) | |
| def test_level_still_works(self): | |
| bench = BenchFloat().to_bench(bn.BenchRunCfg(headless=True, level=3)) | |
| result = bench.plot_sweep() | |
| ds = result.ds | |
| # level 3 → 3 samples | |
| self.assertEqual(len(ds.coords["theta"]), 3) |
| def test_convert_vars_to_params_bad_string_gives_helpful_error(self): | ||
| """Test that a typo in a string variable name gives a helpful KeyError.""" | ||
| with self.assertRaises(KeyError) as ctx: | ||
| self.executor.convert_vars_to_params( | ||
| "thetaa", | ||
| "input", | ||
| None, | ||
| worker_class_instance=self.worker_instance, | ||
| worker_input_cfg=ExampleBenchCfg, | ||
| ) |
There was a problem hiding this comment.
suggestion (testing): Add a test for when there are no close matches to avoid brittle assumptions about suggestions
To better exercise _lookup_param_by_name, please add a case where there are no close matches (e.g. "zzzzzz"). That will verify the error still mentions the missing name and available parameters, and that the "Did you mean" line is omitted (or at least doesn’t surface unrelated suggestions), keeping behavior stable if the suggestion logic or difflib tuning changes.
Resolve conflicts: - bench_cfg.py: keep PR's restructured docstring with parameter groups - sweep_executor.py: use main's _resolve_param (drop redundant _lookup_param_by_name) - sweep_base.py: use PR's LEVEL_SAMPLES constant + main's list() pickle fix - Update tests to match _resolve_param error format
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1224 |
| Total time | 109.65s |
| Mean | 0.0896s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
21.832 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.444 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.966 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
3.065 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
3.047 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.894 |
test.test_bencher.TestBencher::test_combinations_over_time |
1.484 |
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat |
1.154 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.083 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.082 |
Updated by Performance Tracking workflow
- Remove for_ci() and for_time_series() factory classmethods (trivial constructor wrappers that don't justify extra API surface) - Restore full Attributes docstring on BenchRunCfg, merged with the new quick-start examples and level-to-samples table - Add samples_per_var to Attributes list - Bump holobench version to 1.79.0 - Update CHANGELOG with all changes since 1.75.2
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1220 |
| Total time | 107.30s |
| Mean | 0.0880s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
21.047 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.174 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.930 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
3.048 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
3.043 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.872 |
test.test_bencher.TestBencher::test_combinations_over_time |
1.498 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.122 |
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat |
1.090 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.035 |
Updated by Performance Tracking workflow
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1220 |
| Total time | 105.84s |
| Mean | 0.0868s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
21.060 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.239 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.790 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
3.012 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.991 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.802 |
test.test_bencher.TestBencher::test_combinations_over_time |
1.442 |
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat |
1.078 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.049 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.036 |
Updated by Performance Tracking workflow
…ersion bump The CHANGELOG was missing entries for releases 1.76.0, 1.77.0, and 1.78.0. Add per-release sections based on git tag ranges. Move post-1.78.0 changes (not yet released) into [Unreleased]. Revert the version bump to 1.79.0 since that will happen at release time.
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1220 |
| Total time | 106.05s |
| Mean | 0.0869s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
20.930 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.234 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.835 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
3.038 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
3.018 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.777 |
test.test_bencher.TestBencher::test_combinations_over_time |
1.421 |
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat |
1.111 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.079 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.045 |
Updated by Performance Tracking workflow
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1220 |
| Total time | 105.12s |
| Mean | 0.0862s |
| Median | 0.0020s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
20.833 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.100 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
3.823 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
3.028 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.984 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.768 |
test.test_bencher.TestBencher::test_combinations_over_time |
1.423 |
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat |
1.084 |
test.test_optuna_result.TestOptunaReportRouting::test_optuna_plots_per_sweep_tab |
1.069 |
test.test_optuna_result.TestOptunaResult::test_collect_optuna_plots_with_repeats |
1.030 |
Updated by Performance Tracking workflow
Summary
Addresses downstream user feedback that bencher is unintuitive. Four targeted changes:
levelparameter: ExtractLEVEL_SAMPLESconstant, addBenchRunCfg.level_to_samples()lookup, improved docstring with mapping table, newsamples_per_varparameter as a direct alternative to level, INFO-level logging of effective sample counts__init_subclass__auto-wrapping lets subclasses definecompute()instead of theupdate_params_from_kwargs+super().__call__()boilerplate. Classic__call__pattern still works unchanged.KeyErrorlisting available parameter names with "Did you mean?" suggestions viadifflib.get_close_matchesBenchRunCfgdocs & factories: Restructured docstring with grouped parameter sections and quick-start examples. Newfor_time_series()andfor_ci()factory classmethods.Test plan
pixi run cipasses (945 tests, format, lint all clean)test/test_usability.pycoverscompute()method,LEVEL_SAMPLES,level_to_samples(),samples_per_var, and factory classmethods (16 tests)test/test_sweep_executor.pycover helpful error messages on bad string/dict var names (2 tests)compute()wrapping is opt-in only🤖 Generated with Claude Code
Summary by Sourcery
Improve benchmark configuration usability, sample control, and error feedback for sweep variables.
New Features:
Bug Fixes:
Enhancements:
Tests: