Test all models #449

noahho · 2025-08-13T11:30:40Z

Motivation and Context

This patch expands our test coverage to run against all available TabPFN models while keeping CI time reasonable.

Introduce ModelSource in tests to programmatically fetch model filenames.
Restructure parameter grids so we fully exercise one “primary” model (broad grid) and run a fast “smoke” sweep across the remaining models.
Reduce reliance on hardcoded "auto" model selection inside tests to make failures attributable to specific model artifacts.

This addresses gaps identified in “Run tests with all models” and aligns with feedback that initialization and interface tests should validate each shipped model.

Public API Changes

No Public API changes
(Only test code is modified.)

How Has This Been Tested?

Confirmed the new grids keep test duration practical on CPU-only environments.

Checklist

The changes have been tested locally.
Documentation updated: N/A (tests only).
CHANGELOG.md entry: N/A (tests only).
Code follows project style (ruff format & ruff check).
Considered impact on public API: None (tests only).

Implementation Notes

Built:
- _full_grid: exhaustive combos only for the first model path.
- _smoke_grid: a single, fast combo for each remaining model path.
Combined as all_combinations = list(_full_grid) + list(_smoke_grid).

Prior Work / Acknowledgements

Big thanks to the work in PR #437 “Run tests with all models” (by @martino-vic) which motivated this direction and highlighted the need to validate every shipped model. That PR is good prior art but currently not mergeable (pending checks/CLA and minor integration issues). This patch folds in the workable pieces (programmatic model discovery + targeted grid split) and aligns them with the current test suite and style rules.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

noahho · 2025-08-13T11:43:31Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly improves test coverage by parameterizing tests to run against all available TabPFN models, which is a great enhancement. The approach of using a full grid for a primary model and a smoke test for others is clever and keeps CI times reasonable. I've identified a minor robustness issue where the tests could crash if no models are found. Adding a check to gracefully skip the tests in this scenario would make the test suite more resilient. Overall, this is a solid contribution to improving the project's test quality.

tests/test_classifier_interface.py

tests/test_regressor_interface.py

LeoGrin · 2025-08-13T11:49:02Z

bugbot run

tests/test_classifier_interface.py

LeoGrin

LGTM, just two comments

tests/test_classifier_interface.py

LeoGrin · 2025-08-13T14:23:49Z

not sure about the failing test, probably just easier to go over the precision limit by random change if we test more models right?

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: noahho <Noah.homa@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

noahho added 12 commits July 15, 2025 16:32

- Change default estimators for classifier from 4 to 8

42464e3

Merge remote-tracking branch 'origin/main'

abc48a8

Merge remote-tracking branch 'origin/main'

afa7fb3

Merge remote-tracking branch 'origin/main'

775bb5e

Merge remote-tracking branch 'origin/main'

dad171f

Merge remote-tracking branch 'origin/main'

5062414

Merge remote-tracking branch 'origin/main'

2ce9d5b

Merge remote-tracking branch 'origin/main'

eee5d91

Merge remote-tracking branch 'origin/main'

06014f0

Merge remote-tracking branch 'origin/main'

879d428

Merge remote-tracking branch 'origin/main' into test-all-models

fb8e199

model_paths

7381cf6

Copilot AI review requested due to automatic review settings August 13, 2025 11:30

This comment was marked as outdated.

Sign in to view

noahho added 3 commits August 13, 2025 12:34

model_paths

a1ef952

model_paths

eb0978a

model_paths

255ffab

noahho requested a review from Copilot August 13, 2025 11:38

This comment was marked as outdated.

Sign in to view

noahho mentioned this pull request Aug 13, 2025

Run tests with all models #433 #437

Closed

7 tasks

noahho and others added 3 commits August 13, 2025 12:41

Apply suggestion from @Copilot

31c39ae

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

f42dd61

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

model_paths

c8e697f

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

tests/test_classifier_interface.py Outdated Show resolved Hide resolved

tests/test_regressor_interface.py Outdated Show resolved Hide resolved

noahho requested a review from LeoGrin August 13, 2025 11:46

cursor bot reviewed Aug 13, 2025

View reviewed changes

tests/test_classifier_interface.py Outdated Show resolved Hide resolved

LeoGrin approved these changes Aug 13, 2025

View reviewed changes

tests/test_classifier_interface.py Outdated Show resolved Hide resolved

tests/test_classifier_interface.py Show resolved Hide resolved

primary model better

7a48876

noahho merged commit 86bad3f into main Aug 19, 2025
10 checks passed

Test all models #449

Test all models #449

Uh oh!

Conversation

noahho commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Public API Changes

How Has This Been Tested?

Checklist

Implementation Notes

Prior Work / Acknowledgements

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

noahho commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

LeoGrin commented Aug 13, 2025

Uh oh!

Uh oh!

LeoGrin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LeoGrin commented Aug 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

noahho commented Aug 13, 2025 •

edited

Loading