-
Notifications
You must be signed in to change notification settings - Fork 538
Test all models #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test all models #449
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves test coverage by parameterizing tests to run against all available TabPFN models, which is a great enhancement. The approach of using a full grid for a primary model and a smoke test for others is clever and keeps CI times reasonable. I've identified a minor robustness issue where the tests could crash if no models are found. Adding a check to gracefully skip the tests in this scenario would make the test suite more resilient. Overall, this is a solid contribution to improving the project's test quality.
|
bugbot run |
LeoGrin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just two comments
|
not sure about the failing test, probably just easier to go over the precision limit by random change if we test more models right? |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: noahho <Noah.homa@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation and Context
This patch expands our test coverage to run against all available TabPFN models while keeping CI time reasonable.
ModelSourcein tests to programmatically fetch model filenames."auto"model selection inside tests to make failures attributable to specific model artifacts.This addresses gaps identified in “Run tests with all models” and aligns with feedback that initialization and interface tests should validate each shipped model.
Public API Changes
(Only test code is modified.)
How Has This Been Tested?
Checklist
CHANGELOG.mdentry: N/A (tests only).Implementation Notes
_full_grid: exhaustive combos only for the first model path._smoke_grid: a single, fast combo for each remaining model path.all_combinations = list(_full_grid) + list(_smoke_grid).Prior Work / Acknowledgements
Big thanks to the work in PR #437 “Run tests with all models” (by @martino-vic) which motivated this direction and highlighted the need to validate every shipped model. That PR is good prior art but currently not mergeable (pending checks/CLA and minor integration issues). This patch folds in the workable pieces (programmatic model discovery + targeted grid split) and aligns them with the current test suite and style rules.