Skip to content

Conversation

@oscarkey
Copy link
Contributor

We're seeing prediction errors when multiple GPUs are used. Switch to only using multiple GPUs if explicitly enabled, while we debug.

Tested manually on a machine with 2 GPUs and a dataset exhibiting the issue:

  • current main: inference does not work
  • this PR with device=auto: inference works
  • this PR with device=["cuda:0","cuda:1"]: inference does not work

Also, update the device docstring. Don't include multi-gpu inference for now, until we've fixed it.

We're seeing poor prediction quality when multiple GPUs are used. Switch
to only using multiple GPUs if explicitly enabled, while we debug.
@oscarkey oscarkey requested review from bejaeger and noahho September 17, 2025 08:57
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug with multi-GPU inference when device="auto" by correctly defaulting to a single GPU (cuda:0) as a temporary fix. The implementation change in infer_devices is simple and effective. The accompanying docstring updates in TabPFNClassifier and TabPFNRegressor clearly communicate this new behavior to users.

My main concern, which is critical, is that this change breaks an existing unit test. I've left a specific comment with a suggested fix to ensure the test suite passes. Please address this to maintain code quality and test coverage.

Copy link
Collaborator

@noahho noahho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM barring geminis comment

@oscarkey oscarkey merged commit 68093c6 into main Sep 17, 2025
10 checks passed
@oscarkey oscarkey deleted the ok-disable-multigpu branch September 17, 2025 09:28
oscarkey added a commit that referenced this pull request Nov 12, 2025
…ly select the first. (#157)

* Record copied public PR 517

* If device="auto" and multiple GPUs are present, only select the first. (#517)

(cherry picked from commit 68093c6)

---------

Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com>
Co-authored-by: Oscar Key <oscar@priorlabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants